REPLACEMENT SHEET 



Nucleic Acid Molecule Comprising A Nucleic Acid Sequence Which 
Codes For A Haemocyanin And Comprising At Least One Intron 
Sequence (Serial No. 10/049,988; Inventor: Markl Jurgen) 




REPLACEMENT SHEET 

Nucleic Acid Molecule Comprising A Nucleic Acid Sequence Which 2/44 
Codes For A Haemocyanin And Comprising At Least One Intron 
Sequence (Serial No. 10/049,988; Inventor: Markl Jurgen) 




REPLACEMENT SHEET 

Nucleic Acid Molecule Comprising A Nucleic Acid Sequence Which 
Codes For A Haemocyanin And Comprising At Least One Intron 
Sequence (Serial No. 10/049,988; Inventor: Markl Jurgen) 



5/44 



Fig. 3a 



Fig. 3b 



Fig. 3c 




10 



20 30 40 50 

fraction (no.) 



60 70 



A28O 
1.4 



V8-protease d-, 

ei 




10 20 30 40 50 

fraction (no.) 



A 280 
0,5 

0,4 



0,2 
0,1 
0 



papain / 













0 10 20 30 40 50 60 

fraction (no.) 



M NaCI 
1 

0,8 



0,4 
0,2 
0 



70 



REPLACEMENT SHEET 



Nucleic Acid Molecule Comprising A Nucleic Acid Sequence Which 
Codes For A Haemocyanin And Comprising At Least One Intron 
Sequence (Serial No. 10/049,988; Inventor: Markl Jurgen) 



Fig. 3d 



Fig. 3e 



A280 



0,1 
0 



' tr yP sin We . 










* 




• 




■ 


. , ^ — J^. 


-1 1 



M NaCI 
1 

0,8 



0 10 20 30 40 50 

fraction (no.) 



60 70 




0 10 20 30 40 50 60 

fraction (no.) 



70 



REPLACEMENT SHEET 



Nucleic Acid Molecule Comprising A Nucleic Acid Sequence Which y / a a 
Codes For A Haemocyanin And Comprising At Least One Intron ' 1 

Sequence (Serial No. 10/049,988; Inventor: Markl Jurgen) 



Fig. 4a 

Genomic sequence of the HtH1 gene 



SIGNAL PEPTIDE SEQUENCE 1S-1 (1st part) 
GGCTTGTTCAGTTTCTACTCGTCGCCCTTGTG 
INTRON 1S-1/1S-2 (SEQ ID NO: 109) 

GTAAGTCAACGTCTTTGTTTTAAGTTTGATGCATATCTATCATTGCGTTTTAAAATACCA 
TTACAACCAACGTGTCTCTATTGGTCTTCACCTGTTTAACGTATATATTGTTTTTAATGT 
GAAAATCTGAGATTATTTTCATTTCCGTCAATATTCGTAAAATACTATACAAATAAAATT 
GCTTCAGCCTATTGCATTGGCAGTTTTCGCAGT^ATAACGAGGGAAGGCGTACATAAAATA 
TAAACCAGTGTATATTCAAGCATGTTTATAATTTCTTTATAGATTATAACATCATATCAA 
AACACCAATCTGGATTTAAACCCGTGAATCCAAAGTATACCAATTAACGGAACTTTATCA 
TGTTTTATCAAAGGTTTTAGATGAGGGTAAAGAAGTCCGAGCTATATTTTGCGATATCAG 
CAAAGCCTTCTATCACGTCTTGCACACAGGGCTGGTATCTAAACTCGAATCCACAGGAAT 
AAATATTTCAGCCGATAGAGAACAGTCGGTGGCTATCATTGGTCACAAAACAAGTCCAAA 
ATCTGCATTAGCCGGTGTTCCCCAAGGCTCTGTCTTGGGGCCACTATTATTTCTCACCTA 
TATAAACGATTCAACTAATGGAATATAAAGCAACGTAAACCTCACCGCAGATGAAACACT 
AAGTTATAGACAATCCGTTTAAAACCCAGCCACTGCTTAATAATGACTTAGGCCGTCTTT 
CAGACTGGGCTAGTAAGCGGCAGGTTAAATTTCACCTTGAAAAGACAGAAACCATGGTAT 
ATTTCAAAAACACGAATGCAAGTCCTAAACTTCAACTACTACTTGATGATACTGGGATTT 
CTAAAGTGTGTGAACAAAAACACATTGGCCTGATCCTACAAGATAACCAGACAGAAACCA 
TGTTTTTTTTCAATAACACGAATGCAAGTCCTAAACTTCAACTACTACTTGATGATACTG 
GGATTTCTAAAGTGTGTGGTGAACACAAACACCTTGGCCTGATTCTGCAAGATAATGGAA 
AATGTCAGAAACATAAGCAAGTTGATGTGGGGTTTTCTGGGGGTTGTGACAACACCGAAA 
GACCCTGCAACTAATGTTAGCTCAAAGGGTTTTACACCCGGTCACAAGTGGGGATCGACC 
CAGGCACCTTTTGCCTTTGACAGCTCGCCTTTCAAAAAATCTCAATTCGAAAACGAAATC 
TAATAATTTCATGAGCGATACAACCGTTTTTCATAATGCTGTGGTACCGCATACTGTGGA 
AACATCTGTCTACCCATTTGGTAGTCCCCCATAAAATGTATTTATGTTTATAAACACAAT 
GTTTATAGGGTTACAGTTAGAAGAAGCATTTCTATTGGCTAATGTACATTGCTTGTTTTT 
ACTATTGTGCAAAGGCATATTACAGGTCTTTTAGGAAATTAAATACTGTTTAAATCACAT 
ACACTACCGGTAATCCTATTATGCTTATCCTGCCAACATTCTGCCCAAGCAAACGCATGA 
AAGTTAAAGCTGAGTGTAAAATACTGATTGCTGTGTTACTTCACAACCAGTGGACTGAAT 
ACAACCATGTTTTTTCTTGAAAGTCACAAACATCCAGTCGGTTTCTAATGTGTTAAGTTT 
CTAGTTTCATAAAGAGCATGACGTAATGGTGAATAGGAGTTATCAATGTTTCTATCTAAT 
GACTCCTAGTTCGTTACTTTTTTAATAAAACATCCATGTGTTTAATGTTTGGCCACAGAT 
ATAACAAGAAAGAAATCGGATAAAATCTACATTTTGACCAATCGGAAGGCTGCCCCCTCC 
CTAATCCTAATCATTTTTGTGCCTCAAAACATACTCAACCAGACATTTGAACTATGTATA 
TATCAGAATGAAATGGTAACAATAAACTTGTATGTTGACCAGACAGAATTAGGGTGAATC 
TGAATACCAACTATTGTCACATATGAATATGGATAAGCTCTGCGCGTGCGTGCGGGCGGT 
GTAGTGCGTGTGTGTCTGTGTGTGTGTGTGTGTGCGTTTGTGTGTGTGTCTGCGTGCGTG 
TGTGTGCGCGTGTGTGCGTGTGTGTGTGTGTGCAGTGTGCCGAGTGTGTGTGTGTGTGTG 
TGCACAGACATGTGGTTGAGACACACTTGATTCAGTGCAGGATTATGTCCTTCAACCGAG 
TGTAGTCTTTAAGTGTGCCTGGAAACAAAAAACTGCGTTGGGTTGCATCGCCTCTGTAGC 
AAGCTTGGACGCGTCACGCAGCTCTGATACCACGTATTGGCACCATGTTTCATCGGTCTC 
ACGCGAATATTATGCTATGTGTGGCGTATCATACCATAGGTTGGGAACGTTTCAATACTG 
TACCGAGCTTGGGCGTGTCACAAAGCTATGATAAGATGACAACACGTCTTGGCATCTTGT 
TTCCTCGGTATCACGCGCTGTTATGCTATGTGTGGCTATCACACCTTAGGTTGGGAAAGT 
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TTCCACATTTTCCAGCCTCGTACATGTTTCCTTTTGTTTTTTCCTTAGTTATCAGCATAC 
CGTATATTCTATATTTAATGAGCATTTGTATTTTTCTACAG 

SIGNAL PEPTIDE SEQUENCE 1S-2 (2nd part) 

GTGGGGGCTGGAGCAG 

INTRON 1S-2/1A-1 (SEQ ID NO: 110) 

GTGAGTTTCTTAACATTGTCATGGTACATGGATATACGCTCAGTGGGAAAGCAGGATATC 
CCCTTGGTTCAAGTATTCACTTGTCACGCCAAGTGTTCGATTCCCAACATGGAATACTGT 
CATATAGTAAATTGATACACTACTTACATTTAATTCTCCACTAAACGTCAACGTCCTTTA 
CTTCATGGCCCACATGGTCCGTATTAGTGAGTGAGTGAGTCAGGGCATAAGTATTTAACG 
TCAAATCAGCAATATTTCAGCCATATTGTGACAAGAATTGAATATAAATAATTATACTTA 
TAATGCTTATAAATATAAATTATATAAATACCTATAACTATAAATTAGTTATACTAGTAT 
TTATCAAAACATATTTGCCACGACACTGCACGCCGATACTTCAAGTGTCTTCACCTCAAG 
CGTGTAACTCCTCATACTCTGTAATAAGTATGTACACTAAGTGAGTGCTATCATCTCCAT 
GCTTCATTAGTTTCGTCAGATGCGTGTATCCATACGAGTACATTCAGATTATGGGATCCA 
GAGCTTTCTTATCTCAAGTATTTCCGATTGTAAAGCCATACTACTTCCCCAATGACTGAC 
GAGACAGATGGCAACCGTTCTTTCCTCCTGACTAGGTGAGTGCCACTGATAAATCATTAT 
GCCTTTAACATTAGGAATGTTAGCAGTGCACATGTTTCAGAATTGCGACCTTATGGTTGT 
AAAGATTACAAACTTTACAACTTACTTGAGACAGGTTCCATATGTCGTATCTGAAATAGT 
GTGAAGGTATCTGATTCGATGCAATACACAGACATATAAACATATTGTCGCCCTGCTATT 
CCGGAAAGGTCATTTTGTATGTAACGTTCCTTAATGGACACAAACGGAATTATTAGTTAA 
ACATACTCAACAAAACTATGTTATTTTGCAATGGGTAGCACCGAAATCTACCGACAGTGG 
TTCGTAAAAGTAGAACATTCTGACATAAAGAAAAATCATTGGCTTTAAATATATGCAAGT 
TACTTGTCTCTAACAACCAGTTTTATACACATTTCAGAGAACGGGGAATCCGCGATGACA 
ATATCAACGAGTATATACAGAATATATAATTAAAAACGATGAGTGCCTGGCAAGGGAAAG 
AGCGAGATTTGCCAAACAGGGGGGTGGTGTTGAGCTTGAATCGTGGAGAAACGTAGATTG 
AAAGACAAGATGACATCTAATGATCCGAAAATCAAACACAGGATTAACTGGGATGCAGAA 
GAATGAATATCTCAAGCATACATGCAACACTTCATGAATGCATCTCAAACATTTTCGTCA 
GATCGGATGCATGAAGATTTGTAAAGCAATGGTTTAAATTGTCCCTAAACGTTTAGTTGG 
AGATGTATGAGGCTAGGCTGTATGTTGAACGAAACCATTTAACATTGTTGTTCATGATTA 
TTTAATATTTTTTCATTTTATAGATGTACAATAAAATTGGAAACTAAACATTTCCCTTTA 
TTGTTTTGTATTTACCTGTTCATGGGTATGTTTTGAAAGATCGTGATATTTAGTTGGCAT 
TCACAAGTTGGAAAAAGGTCACTCAGTTTGATTTCAAGTTTATGTAACCTCTTTATCTGA 
CGCTCCAAAATATGTATAGCCTTGTTCATCTGTCGGTATGTGGATATTCCTACTTCAGGG 
TAGGGTAGCATTAATACTTACAAAACATAACGTGTACCAGATTTCAGTCACCTCAGAGAT 
GATAATGCATGTCGATATGATAGGTCAAAACTTTCGATATCAATCACAATGAACCTATGG 
ACCCTGAATCGGAATGATACGTTACACTTTAGAAACAATTCACAAATATGACTGTCACCC 
TTTCAGGTAATAATGTTTGACGGACTACGATAGTGCTGAACAGCAGGAGAGGCAACATGG 
TTCGATTGTGAGACAGGTTTAGTGTATTTGTTTGCGAATTTAAGGTTCTGAATCACAATA 
GACACGGTTCAGTTAATGGATAAACCAATCATTAGATAGATAGAGATTAGTCGCGATATT 
GCTGGGATAAAGCTTAGTGGGACGTTAAGTCCCATCTCAATCTCTCTCATTTTTTCCAAA 
ACAGTTTTAATTCAGGCTCATGACAAGGTCGTACTGTTGCAAAGGATTCTACTTCAAGCA 
GAGATGTCTCATGAATACAGTACAGGGTTTTTGAAGTTTATCCAGTGCAGCGCTGGCACC 
ATCTCTGCATGCGAATTATACCATCCATGCCGCTCTAGGCTATTTGTATTAAGTCTGTAG 
AATTAAATTCGCGAGTTGCAAATACTGCTCACCATTATCTGCCTCAACCCAGTTTGGGTA 
CATGCGATTTACACAATATTATGTATAATGTTCGCTTTTCGAAAACAAAACACCTAAATT 
CATCCAAAGTTTTGGGAGATTTTATTCGAGAAATCAACCTGAGATGTTGAATCGGGAGCT 
GCGCTTATTCAATGGTGGACTCGGAAGGGAAGTAACCGCTGATGAGGCAAAACAATAACG 
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CAAACATATGGAAGTGGAACTCTTTGAACCAGTATTATGTTTGTGTGGACATGTATGTGT 
TAATTTGACCATTCGAACAACTTTACTATTCTATTCATT^ATGTGTTTAGATTTACATTTG 
AATTAAAAGAGATGAGTTTAAGATATTAATATTTTCCTTTTATAGTCTGTCGTGATTGTA 
GGGCAATATTTATGTATGTTCGTTCATTTTTCATTTATCATTTGGAAAGGTATATCATAA 
GATTATTATTATCATTCTTGAAGTAATGTATACATATATATATGTCTTGAGTAGCTTATT 
TTCAATTTATTATCATCCGTCATCCAATTTTATTTCACGAAAGTATAAGAAATAACGAGA 
GAGAGAGAGAGAGAGAGAGAAAAGACAGAAATGAAGTTAGGAGATATNAGTTATCAAGAA 
AACAACAGTTTGAATTTTTTGTTTAGACAAGATATCATATCAATAACCTCGCACTATTAC 
GGGAATAGGCGGGCGTTCCATATGCACAATGAATCGTCAGTTAAAATCAACATTAAACTT 
AAAATACTCCTCATATTTAAAGTTGATCTACCTCTTGTATTATTGTAGACTATTAGACAG 
AAGTCGACAGTGACACCAGCAACCAGATATCATACCCAGACTTAAAAAGCTGTTTCCTTG 
ATGTTTCAATTTATTTCCATTTCCATTATTTCCCTTTATTGGTTTCCATTTATCAAACTT 
ACCATCTGCACCAGTGGGAGATTGATATGTTGTATTTATTTATATTTCTTGTACTACAAT 
ATCAAGAATGTATAGGAGCTATTCCTTGTTCCTAAAACCGGATAGATCCATAATTTCCAT 
TTTGGGATAAATGGAAACTAAACACAACTTTTACAGTAAACACGAGTGAGCAAGTTGAGT 
TTTACGCCGTTTTTAGTAGTATTCCAGCAATATCGCGGCGGGGGACACCAGAAATGGGCT 
TCACACAGTGAATGCATGTGGGGATTCGAACCCGGGTCTTCGGCGTGACGAGTGAACGCT 
TTAGCCACTAGGCTACCCCACCGCCTATTTATAGTTAAGACGAATACTTTTCTCAAGCCT 
CAAATATGTCCATTCTAGAGAGACTGAATCTGATCCTGAATCTGCGGACCGGTCTTGAAT 
ATCATCCCACTAACTCATTGTACAAAGTACCTGTAGATTGTCAGTTCAAAGACAGATTTC 
ACAACCCTATTATATTTTGTCCTGCTCATTAAGATATTCAGACTCACTCAAACTGCTAAA 
TGATTTTAATCCTACTTTGAGATGTTTTAACTTTTATTCGATGCATTTTTGCGTTCTGCG 
TCCTGTATAAAGGTAAAGCAGGTAAACTAACCTAACCTGTTGATTTATTTCATAGTTTTG 
CGATCAGATTGAAACCGGAATGCACAGTGAAGTGTGGCATACATCTTTCCACAGAGATAC 
TGGATACTAGGTGGTACAACCGCATTGGCTTTGTGAAAGGATATTAGTGTTTTATGAGAC 
TGACTCATGTTTCAATGCTTAGAGCGGAATGATCTCGGTCTTCATGT^AAAATATTGTGTT 
GAAGTAACCCCCCAGTCCCTAACAGAACGTGGGGAAAGCAGATGGATATGCCAAGACATC 
TTCGCATGGTGTGAAGATGATCGTTACAACATCTGCAGAAAAAGTTATTTCTGTGAAGAA 
TATGCCAAAGCATCACTGTGAGTGTTTTGAAGATGTGATATGGCAACACGCAGCGTGTAA 
TTATGCTTTGTGTGTATTTCTGAAGATCCGTATGAGCATGGCGCCAAACTATCAGTTAAA 
TGGCTATGCGAAGATCTTCCCGAGATGGTAAACACATATTTTGGCCATTTTCTTTGTAAG 
TGGGCGACACAGAAGATCCCCCTGATTGTGTGGATGAGGACACAAAAACGGGTCCCCCTT 
CCTTTGCTGATGCTAATGACGCCCTGGAAACATTGAAAGACTTCTTCTCCAGCAAGCAAG 
CCACCAACCACAAGTTGTATAAATCGCTTGCGGACTTGAATACGGCAGTTGGACAGATAC 
ATACAGCCAGAGAGGGCCGAACTAAAACATCTAAACATGGAAAAACTGTAAAGACAGGCT 
TTGTTGTACGACGTACGTAAATTCATTGAATGTTTGAAAAGGTAGAAAATTATTAAATCT 
TTGAAACCTCGCTCTGTTTGTTTGTTATTGTCCCCCACATTTGCAAATGGTATCCAAAAA 
GGGCAGACACATTTGTTTTAATCTTAGCCAGGTTCAATTTAGCCTTGCGCCCAGACTCAT 
TGTATCTGGTGAAGGCTATAGGTGGCCACGTCTTCTAAGATGCTATGCTATTCTTACCAG 
AATCCAATGTAAAGAGTTCAAACGCATGGTTCGCTTTGATTGTGATTCTTTCTTAGCACC 
TCTCTCCTACCCAGAGTTCACCTGCACTGCTCCTGACTCACAATAAGCTGACGTGCTGTC 
ATATATGTGCAACATTGTATACGTTGGCGTTAAGCCCAACTCACTTCCGCTGTCTTTTGG 
CAG 



DOMAIN 1A-1 (1st part of domain a) 



ACAACGTCGTCAGAAAGGACGTGAGTCACCTCACAGTTGACGAGGTGCAAGCTCTTCACG 
GCGCCCTCCATGACGTCACTGCATCTACAGGGCCTCTGAGTTTCGAAGACATAACATCTT 
ACCATGCCGCACCAGCGTCGTGTGACTACAAGGGACGGAAGATCGCCTGCTGTGTCCACG 
GTATGCCCAGTTTCCCCTTCTGGCACAGGGCATATGTCGTCCAAGCCGAGCGGGCACTGT 
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TGTCCAAACGGAAGACTGTCGGAATGCCTTACTGGGACTGGACGCAAACGCTGACTCACT 
TACCATCTCTTGTGACTGAACCCATCTACATTGACAGTAAAGGTGGAAAG 

INTRON 1A-1/1A-2 (SEQ ID NO: 111) 

GTAACTACAAACGTCGTCCCATTCATACAGGAGAAATATACAATTGTGTTGTAAGAGCGG 
TATACTGTTTGCCAACTGTGTAATTGAAACGTTGATGATGGTGTCTTTGTATTTCAATTT 
GTATGCACTTAGACATGATCAATGTTTCTGATGTGTCAAGGATGTTCGGTGTGTCACTTT 
CAAAAGATCAAATTCATATGACGTACACAGAGCAAGAACCAACAGTAAGAAGTCTGTATG 
ACTTCGCTCTTAAAAGCAATGGAAAAATATTTTCACTTAACACCTAGCCCATAATCACGC 
ATATTAGATTATTCAAGCGATGTCAACATGTTTTTAATATCAATCTCATGGTTCTGATAT 
TACCGGAGACATGCAACAGGCTGCCATTATAGCCAGGAAATCTTATGAATATGTGCATAT 
TTTTTCTTTGATTCTGTATGACGAGAAATATTCGGAGGCAAAGATTGTGTTTTCAGAACA 
GAATCAGGGTATCAGTGACATCGTCACTGCATGGCTACAATATTGCTGATGTGACTGTTT 
CTCCAAGGATTTTCATCTCACTGTCTGTACTTTGAATCTACAAATTCGTATTAAAGTTAT 
GACAATTTTACCCCTGCCTATTTGTAAACGAAATATAACATGAGTGTTTATGCTGACAG 

DOMAIN 1A-2 (2nd part of domain a) 

GCTCAAACCAACTACTGGTACCGCGGCGAGATAGCGTTCATCAATAAGAAGACTGCGCGA 
GCTGTAGATGATCGCCTATTCGAGAAGGTGGAGCCTGGTCACTACACACATCTTATGGAG 
ACTGTCCTCGACGCTCTCGAACAGGACGAATTCTGTAAATTTGAAATCCAGTTCGAGTTG 
GCTCATAATGCTATCCATTACTTGGTTGGCGGTAAATTTGA 

INTRON 1A-2/1A-3 (SEQ ID NO: 112) 

GTAAGTTTGGTTTACAGTTTCATTATAAAAACATAGCAGTTTTAAGTTTAGGGGCAGATT 
CTAATCTCTAATATTCCTTTCAACTCACTTTATTGGTGCCTTCTTGGAGTGACATTTAGA 
AACTAAGACAAGAGGAAGATGAACAATGTTTGTAGGGATAGACAGCTTGGATGCAATTTC 
GGACCAGATTCTAACAGCGTCATGAAGCAAGTGATACACAACGTTATCAATAACGAGAAT 
ATACACATAGATGGTTTGAGTTTATAAATGAACTATTAACGGCATTGTGGTTATAGACAG 
TGAGGAAGACGCCAGATAGACAAAGGGTAGGGGCCTTGGTTAGATAATGAGAAGTTGAAG 
AGGTGTAATAACTTAAATCTCTCTTGACTATTGATTGTGTCTAAGAGTTTTCTTATCTTA 
CAGTCGGCCAGTTGGGTCAAAGATGGTGTGATTCGGATGTGCTTTGTGTGTTCTGCGATG 
GCTGATTTAGAGTCAGTTTACTTCAGATGAATGAAGTTCCCCGATTCTTATGTTTAAGTT 
TGTTTCACCTACGCATGAAGACATCACCAGCAGGGTCGTCTTTATTTCTAGTAGCTTATT 
TACAGCAAGCTTGTAACGTATGCTGAATTGCTGTGCCTCTGTAGAACACAGCATCTATGT 
TTGCTTGCTTCTTTAGTAGACTGCGGATGTGATGGTTGGTTACCTGGTATGCTGACGAAA 
GAATTGTTGACGTGGTGGTTTGCCTTGATGGGTTCGTTGACTTGGTTTGTTGGATACTGA 
TTAAGGTGACTCTGCTGGGAGGCTTGGATTCTGGGGCCGGTGTTCTTTGCTCTCCTGTCT 
AGGGTGGCGATTATTTCCCAACCCACTTGTTCCATTACACTCAAAACCTGCTATCAATTT 
ACAG 

DOMAIN 1A-3 (3rd part of domain a) 

ATATTCAATGTCAAACTTGGAATACACCTCCTACGACCCCATCTTCTTCCTCCACCACTC 
CAACGTTGACCGCCTCTTCGCCATCTGGCAGCGTCTTCAGGAACTGCGAGGAAAGAATCC 
CAATGCAATGGACTGTGCACATGAACTCGCTCACCAGCAACTCCAACCCTTCAACAGGGA 
CAGCAATCCAGTCCAGCTCACAAAGGACCACTCGACACCTGCTGACCTCTTTGATTACAA 
ACAACTTGGATACAG 
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INTRON 1A-3/1A-4 (SEQ ID NO: 113) 

GTGAGACATTATTACACTTCTATTTAGTAGTGGGGGCGGGATAGCTCAGGTGGTAGAGCG 
TCGGCCTTCAGCTTCTAGTCTCGCCCACAAGAGCGCGCTGGCTAAAGGGCCGGAGTTAGA 
TTCCCGCGGGCGGCAGGCAATATCTCCGAAGGGGAGAACAGTTCTCCAGTCGGTGAAATT 
GGGGTGCAATGTTGTACCACTGAAATGCGTGCAGCACCAACCATCCAAATACCAGCCTTG 
CCGCGCTGGTCTGACTACATAGTACCACCCGGATTCAACCGGGCTATATAGGTTCTCCTC 
CAGCAGTAAATCTGACAGTCGCCATATAGCTGGGATATTGCTGAGTGCGACGTTAAGCCC 
CAACTCACTCACTTTATATTTAGTATTCTATTTAGTATCGACGCATGACCATGTGTGGTG 
GTCTACTCATCTCAACACGACCGATTAACGTTAAGAGCTGCCAACATGATTCTCTTTCTC 
TCTTTAGCCTCTTTATGCCAAAAGCTATATATTAATGTAGGACCCTACATATATTATTTC 
CAG 

DOMAIN 1A-4 (4th part of domain a) 

CTACGACAGCTTAAACCTGAATGGAATGACGCCAGAACAGCTGAAAACAGAACTAGACGA 
ACGCCACTCCAAAGAACGTGCGTTTGCAAGCTTCCGACTCAGTGGCTTTGGGGGTTCTGC 
CAACGTTGTTGTCTATGCATGTGTCCCTGATGATGATCCACGCAGTGATGACTACTGCGA 
GAAAGCAGGCGACTTCTTCATTCTTGGGGGTCAAAGCGAAATGCCGTGGAGATTCTACAG 
ACCCTTCTTCTATGATGTAACTGAAGCGGTACATCACCTTGGAGTCCCGCTAAGTGGCCA 
CTACTATGTGAAAACAGAACTCTTCAGCGTGAATGGCACAGCACTTTCACCTGATCTTCT 
TCCTCAACCAACTGTTGCCTACCGACCTGGGAAAGGTCACCTTGACC 

INTRON 1A-4/1B (SEQ ID NO: 114) 

GTAAGTTGATTGTCTTAATATTGTTTTAATTTTTGCAGAAATTTGATTTTAAATTGTGTA 
ATAACAGTACACATTTTTACGCAACAGCAGTCATTATTGTGTGTGAAGATGTCAAACCAG 
AAAGGTTTCAATCGTGAAAACAAAAACAATTCTCTATCTGTATACCCCTCAATACCAGTA 
TGATCACAAATCTAGGAAATATTACAATACTGCTTCATAGAGTAACTGCTGTTTGTGGCA 
GAGCTGGATACGAAGTTTCTGATAGTTCACAGCTACATGATAGTAAATGAACCTGTACAC 
ATCAACGGTTGATCATGAAAATTTTGTATGTGTGAAAGTGCTACCTGTATTAGTGAACGT 
GCTACCTGTATAACTGAAAGTGCTACCTGTATGACTGAAAGTGCTACCTGTATGCTGAAA 
GTGCTACCTGTATTAGTGAACGTGCTACCTGTATAACTGAAAGTGCTACCTGTATGACTG 
AAAGTGCTACCTGTATTAGTGAAAGTGCTACCTGTATGAGTGAACGTGCTACCTGTATAA 
CTGAAAGTGCTACCTGTATGACTGAAAGTGCTACCTGTATTAGTGAAAGTGCTGCCTGTA 
TTAGTGAAAGTGCTACCTGTATGACTGAGCGTGTTACCTGTATGACTGAACGTGCTACCT 
GTATTAGTGAAAGTGTAATCTGTATGAGTGAAAGTGCTACCTGTATTAGTGAAAGTGCTA 
CTTGTATTAGTGAAAGTGCTACATGTATGACTGAAAGTGCTACATGTATGAATGAGAGTG 
CTACCTGTGTGACTGAAAGTGCTACCTGTATTAGTGAAAGTGCTACCTGTATGACTGAAC 
GTGCTACCTGTATTAGTGATAGTGTCACTGGTACCAACTGGATGTTCTCACTTCTTTGGC 
GAATATCTGGGCTCAAAACAGTTTTTCAGTATCATAGTCGTATCAGTTTGATTTGTATGT 
GCAGTGGAATCATTTTCGTCAAATAATCAAAACTGGTGTTGAACTGGCGTTCACGTTTTA 
TGGTTGTAAAACAAATTCTGTAAGTAAAGATATTTTAGGGATATCTGTATGACATGAACT 
GAATTGCTTAAGGTTAGCATGCCATGACAAATTGCTGAATGTCTGAGGATTGGTGGAGCA 
ATAAATCATTATTAAGACAAAAATCAGAAACGTCCATTTTCACTTTTAACAGTGTATCTG 
TCTGAATGCCCCCTACTTTTTGGAAGAGTATATATGAATTATCGGCAATATAAAACGTTA 
AATGGCAAATGTCGGGCATATGTCAGGACATTATTACCGCAGTTTATAGTCATATTTACC 
GGGTCTAGGACAATTGTGACCCCGAGAATTGCCACCCGGACAATTGCCACCCAAAAATAA 
AATATACGTAAACAGAAAACAAATATTGCTTTCAGCCTTTATTGAGTTAGATAATGACAT 
TTATGTTGATAAATATGTCGTTTGATAATAATAATAACAATAATATAATATTACAATACT 
GCAATAGTACTATCAGTACTTATCATTTTATCACAGATTATATATAGATTCTAGAGTCCG 
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ATGTTGTAGGCAACACTTCGTCGGTAGGCCGTTAGGTAGTTATCATTAGGGCTGAGTATT 
GCGCCAAATTTCGTATTGCTATATACTGCGATACACGGTTACCTGTTTTGCAATACGTAA 
ACTTAGGCAAATATGACAGTTTTTCCATGATTATTTTCACGTTTCAATGCTTAAAATGGT 
CTTATCTGTTATCTCCTTGAAGGTTTAATAAAATAACAATAAACATAAATCATTATTGAA 
AATTAATGAACAAAAGTAAAGCGCTTCTCAGTTACCTTAACCTAACTTATTTATGAATGG 
GATTACTATCCAAGAATGTGAAATTCACAAACACCTTGGGATAACACTGCAAAACGACTG 
TTCATGGGACGGACATGAAAAAGGTGAGTCCCATGTTAAACTGTTGAGAAAGTTTCCTAT 
ACTGTTTGTCCCGAAAAAGGCTAAAGACCATGTACTAATCAATTATTCTATCTATTTTCG 
ATTACTGTTCTCATATTTGGGACAACTGTGCAGATCGGTAGCATCCAAGCTCGTCTAAAT 
CGGTTTGATAAACCTTGTCAAATAACATGTTGTCTCAACATCCAAGCTCACCTAAACCTT 
GTCAATACCTGCATCTGAACAAATGTATATTTAAGACGATAGCATCCAAGCTCATCTTTA 
AAATGAATATTTTCTCTTTTTCTACCAAAACATTATTTGGTTGACAGTTGTCCTCCCTAT 
TATAGTAAAAAGAACTGGGTGGCAATTGTCCTAGGTGGCAATTGTCCGGATGGCAATTGT 
CCGGGTGGCAATTGTCCGGGTGGCAGTTGTCCAGGTGGCTATTGTCCTGTTCCCATATTT 
ACGTATCCCATTTTCTGCTCTGTAATTTTAAATAAACTCACCTGCCTAAGGTAAGACGAC 
ATGTGTCACGTGAACATCGTTTGGGGGCAAGGGCGGAATCCCTTCGTTGAAAGTAAATGA 
ATACTGTACATAGAGATGCGTATCTTGAACTCTTTATTAGCTTTGATATTGTGCTTAATA 
TTACATGAATGTATTTCAATATGTAATTATGTGTTCAAATGAATGGTTGACTTGAATGGT 
TTTATTGCTTTATATGCTACATCAACATGTGTGTTTCTTTTCATTTCAG 

DOMAIN IB 

CACCTGTGCATCATCGCCACGATGACGATCTTATTGTTCGAAAAAATATAGATCATTTGA 
CTCGTGAAGAGGAATACGAGCTAAGGATGGCTCTGGAGAGATTCCAGGCCGACACATCCG 
TTGATGGGTACCAGGCTACAGTAGAGTACCATGGCCTTCCTGCTCGTTGTCCACGACCAG 
ATGCAAAAGTCAGGTTCGCCTGTTGTATGCATGGCATGGCATCCTTCCCTCACTGGCACC 
GGCTGTTCGTTACCCAGGTGGAAGATGCTCTTGTACGGCGTGGATCGCCTATCGGTGTTC 
CTTATTGGGACTGGACAAAACCTATGACTCACCTTCCAGACTTGGCATCAAATGAGACGT 
ACGTAGACCCGTATGGACATACACATCATAATCCATTCTTCAATGCAAATATATCTTTTG 
AGGAGGGACACCATCACACGAGCAGGATGATAGATTCGAAACTGTTTGCCCCAGTCGCTT 
TTGGGGAGCATTCCCATCTGTTTGATGGAATCCTGTACGCATTTGAGCAGGAAGATTTCT 
GCGACTTTGAGATTCAGTTTGAGTTAGTCCATAATTCTATTCATGCGTGGATAGGCGGTT 
CCGAAGATTACTCCATGGCCACCCTGCATTACACAGCCTTTGACCCCATTTTCTACCTTC 
ATCATTCCAATGTCGATCGTCTATGGGCAATCTGGCAAGCTCTTCAAATCAGGAGACACA 
AGCCATATCAAGCCCACTGTGCACAGTCTGTGGT^ACAGTTGCCAATGAAGCCATTTGCTT 
TCCCATCACCTCTTAACAACAACGAGAAGACACATAGTCATTCAGTCCCGACTGACATTT 
ATGACTACGAGGAAGTGCTGCACTACAGCTACGATGATCTAACGTTTGGTGGGATGAACC 
TTGAAGAAATAGAAGAAGCTATACATCTCAGACAACAGCATGAACGAGTCTTCGCGGGAT 
TTCTCCTTGCTGGAATAGGAACATCTGCACTTGTTGACATTTTCATAAATAAACCGGGGA 
ACCAACCACTCAAAGCTGGAGATATTGCCATTCTTGGTGGTGCCAAGGAAATGCCTTGGG 
CGTTTGACCGCTTGTATAAGGTCGAAATAACTGACTCATTGAAGACACTTTCTCTCGATG 
TCGATGGAGATTATGAAGTCACTTTTAAAATTCATGATATGCACGGAAACGCTCTTGATA 
CGGACCTGATTCCACACGCAGCAGTTGTTTCTGAGCCAGCTCACC 

INTRON 1B/1C (SEQ ID NO: 115) 

GTAAGTAAATTTACAAAATTTGGTGTTCTCTAACTATCCTAAGTATTCAATCGTTAGCGT 
GTACCTATCTGCATAATGCAATACCCTGACTCCATATAAGTATAGTATATTTACTCTGGT 
CGAAAACAAACAAATTGAAAACAAGAGTGGACGTGCTGTTATGATTTCTTTTTCATTCTT 
GGTTCGTTGTGTAATGCCACAGCCAGCAATTCCAGATATATAGCGACGGTCTATGAATAC 
TCCAGTCTGGACCAGACAATCGTGTGGAATGGTTTAGGCACATTATATCAAATTCATTGT 
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Fig. 4g 

TGAAGATATGAGTTATGAGGTCACAATGTTGTCTTGTTACCCCGTGTCAGTAGTGACGTC 
ATTTCATGACTGAAATCTCTTCAACGCCGTTTAGCAATAATAGGCTCAGTAGTATTCAAC 
CAATTACAATCAGTAGAAAATTCTCTATACTATTCTTATGTTGCATCCTGATATCCCTAT 
GCAAAAATTAGTCATCTAATATAATCATTTTCGATAAATACTTTGGGCAAACAAATCAAT 
GTAACATCTATTTTCTTTCAG 

DOMAIN 1C 

CTACCTTTGAGGATGAAAAGCACAGCTTACGAATCAGAAAAAATGTCGACAGCTTGACTC 
CTGAAGAAACAAATGAACTGCGTAAAGCCCTGGAGCTTCTTGAAAATGATCATACTGCAG 
GTGGATTCAATCAGCTTGGCGCCTTCCATGGAGAGCCTAAATGGTGCCCTAATCCTGAAG 
CGGAGCACAAGGTTGCATGCTGTGTTCATGGCATGGCTGTTTTCCCTCATTGGCACAGGC 
TTCTTGCTCTCCAGGCGGAGAATGCTCTTAGAAAGCATGGGTACAGTGGTGCTCTACCAT 
ACTGGGATTGGACTCGCCCCCTTTCCCAACTTCCTGATCTGGTTAGTCATGAGCAGTATA 
CAGATCCTTCCGACCATCACGTGAAGCATAACCCGTGGTTCAATGGCCACATCGATACAG 
TAAATCAGGATACCACCAGAAGCGTACGGGAGGATCTTTATCAACAACCTGAATTTGGAC 
ATTTCACGGATATTGCTCAACAAGTCCTCTTAGCATTAGAACAAGATGACTTCTGTTCGT 
TTGAAGTGCAGTATGAGATTTCCCATAATTTTATCCATGCACTTGTAGGAGGAACCGACG 
CTTATGGCATGGCATCGCTGAGATATACAGCATACGATCCAATCTTTTTCTTGCATCATT 
CAAACACCGACAGGATCTGGGCTATTTGGCAATCCCTGCAAAAATACAGAGGCAAACCGT 
ACAACACTGCCAACTGCGCCATAGAATCTATGAGAAGGCCCCTGCAACCATTTGGACTAA 
GCAGTGCCATTAACGCTGACAGAATCACCAGAGAGCATGCTATCCCGTTTGATGTCTTCA 
ACTATAGAGAT7VA.CCTTCATTACGTATATGATACCCTGGAATTTAATGGTTTGTCGATTT 
CACAACTTGATAGAGAGCTGGAAAAAATCAAGAGTCACGAAAGAGTATTTGCTGGATTCT 
TGCTGTCGGGGATTAAAAAATCTGCTCTTGTGAAATTCGAAGTTTGTACTCCACCTGATA 
ATTGTCATAAAGCAGGGGAGTTTTATCTACTCGGGGACGAAAACGAGATGGCTTGGGCCT 
ATGACCGACTTTTCAAGTATGATATTACTCAGGTTCTGGAAGCAAACCATCTACACTTCT 
ATGATCATCTCTTCATTCGCTACGAAGTCTTTGATCTTAAAGGAGTGAGTTTGGGAACTG 
ACCTGTTCCACACTGCAAATGTGGTACATGATTCCGGCACAG 

INTRON 1C/1D (SEQ ID NO: 116) 

GTACGTGGATTTGATTACATAGCAATGCTATATGATTTCAGTAATTACAACCTCAAGTCA 
TGTAGCCGTTTTAGATTGCATTACATCAAACAGCATTGGATTAAATTGGGGGATTGTCCA 
GGCCGCATTATGTTGCATTCCGAAAATAGTTTGTGTCCAGTGTCCACGTTTAAAATTAAA 
CCATTTTAATCATATTAGGGATAATTTTAATAGATGTTATAGTGCTTTATTTCATATTGT 
TACAGTGGACAGTCACCAAGGACATATTTTACTCTATAGATACACAAACACCAATTAAAA 
CCCTGCTTTGGAAAGTCTAACTTTTTCCCCACAG 

DOMAIN ID 

GCACCCGTGATCGTGATAACTACGTTGAAGAAGTTACTGGGGCCAGTCATATCAGGAAGA 
ATTTGAACGACCTCAATACCGGAGAAATGGAAAGCCTTAGAGCTGCTTTCCTGCATATTC 
AGGACGACGGAACATATGAATCTATTGCCCAGTACCATGGCAAACCAGGCAAATGTCAAT 
TGAATGATCATAATATTGCGTGTTGTGTCCATGGTATGCCTACCTTCCCCCAGTGGCACA 
GACTGTATGTGGTTCAGGTGGAGAATGCTCTCCTAAACAGGGGATCTGGTGTGGCTGTTC 
CTTACTGGGAGTGGACTGCTCCCATAGACCATCTACCTCATTTCATTGATGATGCAACAT 
ACTTCAATTCCCGACAACAGCGGTACGACCCTAACCCTTTCTTCAGGGGAAAGGTTACTT 
TTGAAAACGCAGTCACAACAAGGGACCCACAAGCCGGGCTCTTCAACTCAGATTATATGT 
ATGAGAATGTTTTACTTGCACTGGAGCAGGAAAATTATTGTGACTTTGAAATTCAGTTTG 
AGCTTGTTCATAACGCACTTCATTCCATGCTGGGAGGTAAAGGGCAGTACTCCATGTCCT 
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CCCTGGACTATTCTGCGTTTGATCCCGTCTTCTTCCTACATCATGCCAACACGGACAGAC 
TGTGGGCAATCTGGCAGGAACTACAAAGATTCCGAGAACTGCCTTATGAAGAAGCGAACT 
GTGCAATCAACCTCATGCATCAACCACTGAAGCCGTTCAGTGATCCACATGAGAATCACG 
ACAATGTCACTTTGAAATACTCAAAACCACAGGACGGATTCGACTACCAGAACCACTTCG 
GATACAAGTATGACAACCTTGAGTTCCATCACTTATCTATCCCAAGTCTTGATGCTACCC 
TGAAGCAAAGGAGAAATCACGACAGAGTGTTTGCGGGCTTCCTTCTTCATAACATAGGAA 
CTTCTGCTGACATAACTATCTACATATGTCTGCCTGACGGACGGCGTGGCAATGACTGCA 
GTCATGAGGCGGGAACATTCTATATCCTCGGAGGCGAAACAGAGATGCCTTTTATCTTTG 
ACCGTTTGTATAAATTTGAAATCACCAAACCACTGCAACAGTTAGGAGTCAAGCTGCATG 
GTGGAGTTTTCGAACTGGAGCTTGAGATCAAGGCATACAACGGTTCCTATCTGGATCCCC 
ATACCTTTGATCCAACTATCATCTTTGAACCTGGAACAG 

INTRON ID/ IE (SEQ ID NO: 117) 

GTAATGCCATCTTAATACAGTTCGTTCGTTAAATTATATATGTTCGTTTACAACACCATA 
CCTTGAATTGAGGTAATACATCACTTGATATTGATAATGTAATGGTAATTGTTCTTGTTT 
GTAAAACCGTTTCTGGGGTGTTTATTCACTATCCACCTGGTGGATAGTGAGTAAACACAT 
TCGGTTTAATATGGGTATCTAATGGACAGTGAAGTGTGCTGGCTAGGCAGATACCTTGGT 
TTCTGTGAATGGAGGTAGTAGAAAGGGGTTTTGATGATTGCAG 

DOMAIN IE 

ATACCCATATCTTGGACCACGACCATGAGGAAGAGATACTTGTCAGGAAGAATATAATTG 
ATTTGAGCCCAAGGGAGAGGGTTTCTCTAGTCAAAGCTTTGCAAAGAATGAAGAATGATC 
GCTCCGCTGATGGGTACCAAGCCATTGCCTCTTTCCATGCCCTGCCACCACTCTGTCCCA 
ATCCATCTGCAGCTCACCGTTATGCTTGCTGTGTCCATGGCATGGCTACATTTCCCCAGT 
GGCACAGACTGTACACTGTTCAGGTTCAGGATGCCCTGAGGAGACATGGTTCACTTGTTG 
GTATTCCTTACTGGGACTGGACAAAACCAGTCAACGAGTTACCCGAGCTTCTTTCTTCAG 
CAACATTTTATCATCCAATCCGGAATATTAATATTTCAAATCCATTCCTCGGGGCTGACA 
TAGAATTTGAAGGACCGGGCGTTCATACAGAGAGGCACATAAATACTGAGCGCCTGTTTC 
ACAGTGGGGATCATGACGGATACCACAACTGGTTCTTCGAAACTGTTCTCTTTGCTTTGG 
AACAGGAAGATTACTGCGATTTTGAAATACAATTTGAGATAGCCCATAATGGCATCCACA 
CATGGATTGGTGGAAGCGCAGTATATGGCATGGGACACCTTCACTATGCATCATATGATC 
CAATTTTCTACATCCACCATTCACAGACGGACAGAATATGGGCTATTTGGCAAGAGCTGC 
AGAAGTACAGGGGTCTATCTGGTTCGGAAGCAAACTGTGCCATTGAACATATGAGAACAC 
CCTTGAAGCCTTTCAGCTTTGGGCCACCCTACAATTTGAATAGTCATACGCAAGAATATT 
CAAAGCCTGAGGACACGTTTGACTATAAGAAGTTTGGATACAGATATGATAGTCTGGAAT 
TGGAGGGGCGATCAATTTCTCGCATTGATGAACTTATCCAGCAGAGACAGGAGAAAGACA 
GAACTTTTGCAGGGTTCCTCCTTAAAGGTTTTGGTACATCCGCATCTGTGTCATTGCAAG 
TTTGCAGAGTTGATCACACCTGTAAAGATGCGGGCTATTTCACTATTCTGGGAGGATCAG 
CCGAAATGCCATGGGCATTCGACAGGCTTTATAAGTATGACATTACTAAAACTCTTCACG 
ACATGAACCTGAGGCACGAGGACACTTTCTCTATAGACGTAACTATCACGTCTTACAATG 
GAACAGTACTCTCGGGAGACCTCATTCAGACGCCCTCCATTATATTTGTACCTGGACGCC 

INTRON 1E/1F-1 (SEQ ID NO: 118) 

GTGAGTACCTGTTTGCACTAAGACTTCTGTAGGCTAAAAGTGTAAGAAATATCAATTAAT 
TTCAATTCACCCAAACTTGAAAACGGTACCTATATAGGTTAACTTTTTGTCTACAGTAAA 
CTGAACATACCTACACATTTCATGAAATGATCTCTCAATATTTTCCACCAACAG 
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DOMAIN 1F-1 (1st part of domain f) 

ATAAACTCAACTCACGGAAACATACACCTAACAGAGTCCGCCATGAGCTAAGTAGCCTTA 
GTTCCCGTGACATAGCAAGCTTGAAGGCAGCTTTGACAAGCCTTCAACATGATAATGGGA 
CTGATGGTTATCAAGCTATTGCTGCCTTCCATGGCGTTCCTGCGCAGTGCCACGAGCCAT 
CTGGACGTGAG 

INTRON 1F-1/1F-2 (SEQ ID NO: 119) 

GTAAATTTACAGAGCTTTATGAAGTGTGTTCAGAGTGAAGAGACCAAGATATACTTATAC 
CCAAAACTAGCTAGCAACAGACGATTTCACTTGTTTCGGACACTTTGTATTATACGTTGG 
ATCCCAAGGTAAACGGAAACGTAACCGAGAATCAGTCCGTAAAGTGAGTGAGTGAGTTTG 
GGGCTTAACGTCGCACTCAGCAATACCCCAGCTATGTGGCGACTCTCAGATTTACTGCTG 
GAGGAGAACCTACATAGCCCGGTTTAACCCGTGTGGTATGTAGTAAGACCAGCGCGGCAT 
GGCTGGTATCTGACGGACGAAGGGTGGCGCTGCACGTATTCCAGTGGTACAACACTGCAC 
CCCAATTTCACCGACCGGAGAACTGATCTCCCCTTCGGAGATATCGCCTGCCTTCCACGG 
GATTCGAA'CTCGGTGACCTTCAAGCCAGCGCGCTTCTAGCGGGGGCGATTAGAGGTTNAA 
GGCCGACGGCTCTACCACCTTAACTATCCCCCGGCCCCACTCCTGACGGAAATGTTTATA 
ATTCAGCCTTTGTTTTCTTATTAAACACTCTTGGCAGATTTTCTATAGATAATGGATTCA 
CATGTAGACAGTCTCCCATTGTTGTAACTGGTAGTCAAGAGTTAGAATCTGAATACATTC 
TCCAAGATGGATCAAGGAAAACAATAATTACTTGATGTTGCAG 

DOMAIN 1F-2 (2nd part of domain f ) 

ATCGCCTGTTGCATCCACGGCATGGCGACGTTTCCTCACTGGCACCGGTTGTACACTCTG 
CAGTTGGAGCAAGCGCTGCGCAGACACGGGTCCAGTGTTGCTGTTCCATACTGGGACTGG 
ACCAAGCCAATCACCGAACTGCCACACATTCTGACAGACGGAGAATATTATGACGTTTGG 
CAAAATGCCGTCTTGGCCAATCCGTTTGCAAGAGGTTATGTGAAAATTAAAGATGCATTT 
ACGGTGAGAAATGTCCAGGAAAGTCTGTTCAAAATGTCAAGTTTTGGAAAGCACTCGCTT 
CTGTTTGACCAGGCTTTGTTGGCTCTTGAACAAACTGACTACTGTGACTTCGAAGTTCAG 
TTTGAAGTGATGCATAACACGATCCATTATCTCGTAGGAGGGCGTCAAACGTACGCCTTC 
TCCTCTCTCGAGTATTCCTCATACGATCCAATCTTCTTTATTCACCACTCGTTTGTTGAC 
AAAATATGGGCTGTATGGCAAGAACTGCAAAGCAGGAGACATCTACAGTTTAGAACAGCT 
GATTGTGCTGTGGGCCTCATGGGTCAGGCAATGAGGCCTTTCAACAAGGATTTCAACCAC 
AACTCGTTCACCAAGAAGCACGCAGTCCCTAATACAGTATTTGATTATGAAGATCTTGGC 
TATAACTATGACAACCTTGAAATCAGTGGTTTAAACTTAAATGAGATCGAGGCGTTAATA 
GCAAAACGCAAGTCACATGCTAGAGTCTTTGCTGGGTTCCTGTTGTTTGGATTAGGAACT 
TCGGCTGATATACATCTGGAAATTTGCAAGACATCGGAAAACTGCCATGATGCTGGTGTG 
ATTTTCATCCTTGGAGGTTCTGCAGAGATGCATTGGGCATACAACCGCCTCTACAAGTAT 
GACATTACAGAAGCATTGCAGGAATTTGACATCAACCCTGAAGATGTTTTCCATGCTGAT 
GAACCATTTTTCCTGAGGCTGTCGGTTGTTGCTGTGAATGGAACTGTCATTCCATCGTCT 
CATCTTCACCAGCCAACGATAATCTATGAACCAGGCGAAG 

INTRON 1F-2/1G-1 (SEQ ID NO: 120) 

GTGAGATATATGCAAATTGAATGTTGTCCAGATGCGTTGTTTACATTTATATGCTTGGAA 
TTGTCCTGAACGAATACAGTGGAATAACCAAAAGCTGAAAAATAAAAAGATATATACTTC 
ATTCTGAATTTGTCAGTATTGCTGACCCAAAAACACGTTATCCATGTCGACACTATATTT 
GCCTTTCTGAATCTGAGACTGCGTTATGTTTCTAATAATCACGAAATATGGTATACAGGT 
TGTGTATCTGTAGAATACCCAAGGCAGAATTTAAAGGGTCACACCCTGTTTAATACAG 
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DOMAIN 1G-1 (1st part of domain g) 



ATCACCATGACGACCATCAGTCGGGAAGCATAGCAGGATCCGGGGTCCGCAAGGACGTGA 
ACACCTTGACTAAGGCTGAGACCGACAACCTGAGGGAGGCGCTGTGGGGTGTCATGGCAG 
ACCACGGTCCCAATGGCTTTCAAGCTATTGCTGCTTTCCATGGAAAACCAGCTTTGTGTC 
CCATGCCTGATGGCCACAACTACTCATGTTGTACTCACG 

INTRON 1G-1/1G-2 (SEQ ID NO: 121) 

GTAAGTTTGTGTTGGTTAGTGTTGGTTGCATGTTTTGCCATATCGATAGTATCAGTGTGG 
TAACATCTGGTTTCTAGTTCATTCAGTTCACCTTATCAGAAGCTGTTTGCTCTCGTCTAC 
AATAGTGACGTCTTTCAGTTTTAGAACCGTGTACATCCGGGTTATATTGGTCTCCAGCAA 
CCCGTGCTTGTCGTGGGAGGCCACTGATGGGAACGGGTGGTCAGACTCGCTCACTTAGTT 
GACACATGTCAATTGCGAAGATCGATGCTGAGGTTGTTAAACATTGGATTGTCTGGTCCA 
GACTCGATTATTTACAGACAGCCGCCATGTACCTGGAATATTGCTGAGTGCGGCGTTAAA 
CAACAAACTAGTCAGACTAATCTTTCACTGTTTATAATGATGGCTCGAACCTAGCACTCA 
TGTCCCAAGTTGGCGAACATCTGGAAGGGAATTTCAAATGAAAAGAACAATCTTTCACGT 
CTATTGGTATCACGCTCCTGGAGAAGAACATGATGTTCACGGCGTTACTTCCTCTTACCT 
GTTTTACTTGTTCCCACGTTTCTTCATATTTAAAGAGTATTTGGGTATTAGAGCTTTGGT 
GCTGTTACAATGCTACTCAACTGTTCAGTGCGGGCGACCGCGCTTGTTTACACATTAAGT 
TTTGTTTGTTGGTTGGTTTGTGTGTGTGTGTGTATGTGTGTGTGTGTGTGTGTGTGTGTA 
TGTGTGTGTGTGTGTATCTATGTCTATGTGTCTGTGTCTGTGTGTCTGTCTATGTGTGTG 
TGTGTCTGTGTCTATGTGTGTGTCTGCGTGTGTGTCTGTGTCCGTATGTGGCTGTGTCTA 
TGTGTGTGTGTGTCTGTGTTTATGTGTGTATATGCGTGTGTGTCTGTGTCCGTATGTGGC 
TGTGTCTATGTGTGTGACATGCAATACATGCTGTGATACTCACTAGCTGCGTCTATCGAC 

CAG 

DOMAIN 1G-2 (2nd part of domain g) 

GCATGGCTACCTTCCCACACTGGCATCGCCTCTACACCAAGCAGATGGAGGATGCAATGA 
GGGCGCATGGGTCTCATGTCGGCCTGCCCTACTGGGACTGGACTGCTGCCTTCACCCACC 
TGCCAACACTGGTCACCGACACGGACAACAACCCCTTCCAACAT 

INTRON 1G-2/1G-3 (SEQ ID NO: 122) 

GTAAGAGCGGGGTAGGGATGGGGTGGTAGGGGGTGGGTTGTTCTATTACTTCCCGCTTCA 
CTTGTATGAAATGGATAACCTTGGCTGCATCCCAATTGCGTGATCGATTCTCTTTCGATT 
CACTCGTGCGATTAGACTGCCTTATTTACTATAGTAGTTAGAATGTTGCTCAGTGCGCCG 
TTAAACAACTAATACACAAAACCGCATTTGTTTTATATGGTCACTCTACTGTTTATCACG 
TATATGTATGTTCCGACTCACTGGTTGGTGCGTACCATTCTACTGTCACACTGAGAGCCA 
ATGTTCTCAGATGTGTGAAATGTTTGAAAGCCGTTTCTACATAATATTGCAGGAATACCA 
TTGTAGAATGTAGTCAAACAGGTAACAATCTGTTAGTGAGCCCAGTTCGAGGTTGCGTTG 
TAGGGTGTAGTCCAACAGGTAGGCAGTCCATAAGCATAGTTTTTAAGCATTTTAGATCAT 
CTATAATTAACCACATGGTTAGCCGCTATGTTTAGTTTAATCCAGTATAAGTTAGAACTG 
TTATATTTCGAAGGGAAGTGAGTAAATCCTTATTCCTTGACTACCATTTAATAGATTTCC 
CAATGACTCCATTCAACTCCTAACTTTCACATCACTGCTCTCTTCAACAG 

DOMAIN 1G-3 (3rd part of domain g) 

GGACACATTGATTATCTCAATGTCAGCACAACTCGATCTCCCCGAGACATGCTGTTCAAC 
GACCCCGAGCATGGATCAGAGTCGTTCTTCTACAGACAAGTCCTCTTAGCTCTGGAACAA 
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ACTGATTTCTGCAAATTCGAAGTTCAGTTTGAGATAACCCACAATGCCATCCATTCCTGG 
ACAGGTGGCCACAGCCCCTACGGAATGTCCACTCTCGACTTCACTGCCTACGATCCTCTC 
TTCTGGCTTCACCACTCCAACACCGACAGAATCTGGGCTGTCTGGCAAGCTTTGCAAGAA 
TACAGAGGACTTCCATACAACCATGCCAATTGTGAGATCCAGGCAATGAAAACGCCCCTG 
AGGCCTTTCAGTGACGATATCAACCACAACCCAGTCACAAAGGCTAACGCGAAGCCATTA 
GATGTGTTCGAGTATAATCGGTTGAGCTTCCAGTACGACAACCTCATCTTCCATGGATAC 
AGTATTCCGGAACTTGATCGCGTGCTTGAAGAAAGAAAGGAGGAGGACAGAATATTTGCT 
GCCTTCCTTCTCAGTGGAATCAAGCGTAGTGCTGATGTAGTGTTCGACATATGCCAGCCA 
GAACACGAATGTGTGTTCGCAGGGACTTTTGCGATTTTGGGAGGGGAGCTAGAAATGCCC 
TGGTCCTTCGACAGACTGTTCCGCTATGATATCACCAAGGTGATGAAGCAGCTACACCTG 
AGGCATGACTCTGACTTTACCTTCAGGGTGAAGATTGTCGGCACCGACGACCACGAGCTT 
CCTTCAGACAGTGTCAAAGCACCAACTATTGAATTTGAACCGGGCG 

INTRON 1G-3/1H (SEQ ID NO:123) 

GTGAGTACGACAGGCATTTCTAGTAAAAACCTACTTTTGGTAAAAGGTTCGAGAAATCAC 
TTGAAGCAACAACATGATTTTGTAACGCCTATTACACGTGAACATGTCACACCCGGTGAT 
GCCGTTTAATGGACATGCCTCTGTTAATGAAAGGGGTAAGTACATGTGTATGGGGATGGG 
ATGGGAGCCACCTGTCCCAATTTCATAGGTCCCTAGGATCCCAGTTGCGTAGGAATCCCC 
TGATTAATGCCTTGTGAATTCCTCCTGGAATTGTCCTGGCCCAAATTTTTACAAACCCGC 
CCCGATATACCTTGGAAATAATTGGGCCTAAGGGTGGGGCTTTTAAGGACCAAGAACCCA 
ACCTAAACCCCAACCCATTTTTTCCCACCCATTCCAGGTTTTGTTTTACCAAATAAAAAG 
GTTTCCACTTTGAGGAAACCCTTTAAGGGTTCTTTTCAGGGCTTTTTTTCTTTTCTGGGA 
ATTCCAATTCCGGGGGAACAAAATACATATATTTCACAGACCTTTGGTCAAATTTATATA 
ATTTCCGACTTCATGTCATAGGTTTGTCTTTCTTCCTACACAG 

DOMAIN 1H 

TGCACAGAGGCGGAAACCACGAAGATGAACACCATGATGACAGACTCGCAGATGTCCTGA 
TCAGGAAAGAAGTTGACTTCCTCTCCCTGCAAGAGGCCAACGCAATTAAGGATGCACTGT 
ACAAGCTCCAGAATGACGACAGTAAAGGGGGCTTTGAGGCCATAGCTGGCTATCACGGGT 
ATCCTAATATGTGTCCAGAAAGAGGTACCGACAAGTATCCCTGCTGTGTCCACGGAATGC 
CCGTGTTCCCCCACTGGCACCGCCTGCATACCATTCAGATGGAGAGAGCTCTGAAAAACC 
ATGGCTCTCCAATGGGCATTCCTTACTGGGATTGGACAAAGAAGATGTCGAGTCTTCCAT 
CTTTCTTTGGAGATTCCAGCAACAACAACCCTTTCTACAAATATTACATCCGGGGCGTGC 
AGCACGAAACAACCAGGGACATTAATCAGAGACTCTTTAATCAAACCAAGTTTGGTGAAT 
TTGATTACCTATATTACCTAACTCTGCAAGTCCTGGAGGAAAACTCGTACTGTGACTTTG 
AAGTTCAGTATGAGATCCTCCATAACGCCGTCCACTCCTGGCTTGGAGGAACTGGAAAGT 
ATTCCATGTCTACCCTGGAGCATTCGGCCTTTGACCCTGTCTTCATGATTCACCACTCGA 
GTTTGGATAGAATCTGGATCCTTTGGCAGAAGTTGCAAAAGATAAGAATGAAGCCTTACT 
ACGCATTGGATTGTGCTGGCGACAGACTTATGAAAGACCCCCTGCATCCCTTCAACTACG 
AAACCGTTAATGAAGATGAATTCACCCGCATCAACTCTTTCCCAAGCATACTGTTTGACC 
ACTACAGGTTCAACTATGAATACGATAACATGAGAATCAGGGGTCAGGACATACATGAAC 
TTGAAGAGGTAATTCAGGAATTAAGAAACAAAGATCGCATATTTGCTGGTTTTGTTTTGT 
CGGGCTTACGGATATCAGCTACAGTGAAAGTATTCATTCATTCGAAAAACGATACAAGTC 
ACGAAGAATATGCAGGAGAATTTGCAGTTTTGGGAGGTGAGAAGGAGATGCCGTGGGCAT 
ATGAAAGAATGCTGAAATTGGACATCTCCGATGCTGTACACAAGCTTCACGTGAAAGATG 
AAGACATCCGTTTTAGAGTGGTTGTTACTGCCTACAACGGTGACGTTGTTACCACCAGGC 
TGTCTCAGCCATTCATCGTCCACCGTCCAGCCCATGTGGCTCACGACATCTTGGTAATCC 
CAGTAGGTGCGGGCCATGACCTTCCGCCTAAAGTCGTAGTAAAGAGCGGCACCAAAGTCG 
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Fig. 41 



AGTTTACACCAATAGATTCGTCGGTGAACAAAGCAATGGTGGAGCTGGGCAGCTATACTG 
CTATGGCTAAATGCATCGTTCCCCCTTTCTCTTACCACGGCTTTGAACTGGACAAAGTCT 
ACAGCGTCGATCACGGAGACTACTACATTGCTGCAGGTACCCACGCGTTGTGTGAGCAGA 
ACCTCAGGCTCCACATCCACGTGGAACACGAGTAG 



3'UTR 



TTCACAG 

INTRON 3"UTR (SEQ ID NO: 124) 

GTGAGGAGAAGGCCCCAGGCTAGCAGGGCAATGGATGAAGGAAATAGGGGCAAAGGGAAT 
AGCAGTTACACCATCGACATTTCCAACCTCCTCAGAAACTAATATATAGCCTTAATACAA 
CCAGCCAAGACTCAACGGGCAGCCGGGGTGGGGGGATTTGGTGGTCGCTGTTTCAGACCA 
GGGTGCAAAATATCAGTGCGCAAATCAACATGTTGCGTGTCAGACACTGACACAGCAGTC 
ATTGAACCTGCAGACCCATAACAGGAAAATGGGGCAGATACGATCAAAGACAGTGTAAAA 
TAGGGATAAGTAGGCATATGCAACCACCTGATGGAAATGAAAAGGGGTAAGTTTAAACCC 
CGGCTACCAAAGGTCCAATGGTTCCTTAACCCAGCTTACGCTATCCCTCTAATTTCAGTA 
TTGAGCTGATTTCTGTCGAGTTCATGTAAACTGTATACTTTCTGTATTATTACAG 

3 'UTR 

GTTGCTATGCCGACTGCGCTATATTGGTGAACGAGACGATGAGGACATCTCTGAAAGAGT 
TCGCCAAGTGATGTGTAGGTCACGGAAGTATTGTTGAGCTAACAATATGATGATTTCAAA 
ATGACTTGGCGCTCTAGGACAAAGACATAATTCATCAGCACCCTGTGCACCAACTCTTTG 
TTTGCTGCAAACGTCTGACAAGCGACACGTCAATCAACAAGCTGTTCAAACTCAAGTGGA 
TGTAACTAGAATCGTTGGGCCATCGTTCACAAAGTATTGACAGATGTCACACATGATGGC 
GAGAAACACTTTAGAACTTTTAATGACCTAGAGTGACTTGTAAATATGTAAATATATTCT 
TCAAAGACTCAGCTGAACTATTGTTGGATAACACATCAATTCCCTCAACAAAATGCTTTA 
TCTTCACATGGATGTATGTAATGTGGCCGGCAATAAAGTATATATATGTAT 
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Primary structure of the HtH1 protein 

SIGNAL PEPTIDE 
LVQFLLVALWGAGA 
DOMAIN A 

DNWRKDVSHLTVDEVQALHGALHDVTASTGPLSFEDITSYHAAPASCDYKGRKIACCVHGMPSFP 
F WHRAYWQAE RALLS KRKTVGMPYWD WTQTLTHLP S L VTE P I Y I D S KGGKAQTNYW YRGE I AF I N 
KKTARAVDDRLFEKVEPGHYTHLMETVLDAIjE 

YTSYDPIFFLHHSNVDRLFAIWQRLQELRGKNPNAMDCAHELAHQQLQPFNRDSNPVQLTKDHSTP 
ADLFDYKQLGYSYDSLNLNGMTPEQLKTELDERHSKERAFASFRLSGFGGSANVVVYACVPDDDPR 
SDDYCEKAGDFFILGGQSEMPWRFYRPFFYDVTEAVHHLGVPLSGHYYVKTELFSVNGTALSPDLL 
PQPTVAYRPGK 

DOMAIN B 

GHLDPPVHHRHDDDLIVRKNIDHLTREEEYELRMALERFQADTSVDGYQATVEYHGLPARCPRPDA 
KVRFACCMHGMAS FPHWHRLFVTQVEDALVRRGS P I GVP YWDWTKPMTHLPDLASNET YVDP YGHT 
HHNPFFNANISFEEGHHHTSRMIDSKLFAPVAFGEHSHLFDGILYAFEQEDFCDFEIQFELVHNSI 
HAW I GGS EDYSMATLHYTAFDP I F YLHHSNVDRLWAI WQALQ I RRHKP YQAHCAQ S VEQL PMKP FA 
FPSPLNNNEKTHSHSVPTDIYDYEEVLHYSYDDLTFGGMNLEEIEEAIHLRQQHERVFAGFLLAGI 
GTSALVDIFINKPGNQPLKAGDIAILGGAKEMPWAFDRLYKVEITDSLKTLSLDVDGDYEVTFKIH 
DMHGNALDTDLI PHAAWSEPAH 

DOMAIN C 

PTFEDEKHSLRIRKNVDSLTPEETNELRKALELLENDHTAGGFNQLGAFHGEPKWCPNPEAEHKVA 
CCVHGMAVFPHWHRL^LQAENALRKHGYSGALPYWDWTRPLSQLPDLVSHEQYTDPSDHHVKHNP 
WFNGHIDTVNQDTTRSVREDLYQQPEFGHFTDIAQQVLLALEQDDFCSFEVQYEISHNFIHALVGG 
TDAYGMASLRYTAYDPIFFLHHSNTDRIWAIWQSLQKYRGKPYNTANCAIESMRRPLQPFGLSSAI 
NPDRI TREHAI PFDVFNYRDNLHYVYDTLEFNGLS I SQLDRELEKI KSHERVFAGFLLSGI KKSAL 
VKFEVCTPPDNCHKAGEFYLLGDENEMAWAYDRLFKYDITQVLEANHLHFYDHLFIRYEVFDLKGV 

SLGTDLFHTANWHDSGT 
DOMAIN D 

GTRDRDNYVE E VTGAS H I RKNLNDLNTGEME S LRAAFLH I QDDGT YE S I AQ YHGKPG KCQ LNDHN I 
ACCVHGMPTFPQWHRLYWQVENALLNRGSGVAVPYWEWTAPIDHLPHFIDDATYFNSRQQRYDPN 
PFFRGKVTFENAVTTRDPQAGLFNSDYMYENVLLALEQENYCDFEIQFELVHNALHSMLGGKGQYS 
MSSLDYSAFDPVFFLHHANTDRLWAIWQELQRFRELPYEEANCAINLMHQPLKPFSDPHENHDNVT 
LKYSKPQDGFDYQNHFGYKYDNLEFHHLSIPSLDATLKQRRNHDRVFAGFLLHNIGTSADITIYIC 
LPDGRRGNDCSHEAGTF YI LGGETEMPF I FDRLYKFE I TKPLQQLGVKLHGGVFELELE I KAYNGS 
YLDPHTFDPTI IFEPGT 

DOMAIN E 

DTHILDHDHEEEILWKNIIDLSPRERVSLVKALQRMKNDRSADGYQAIASFHALPPLCPNPSAAH 
RYACCVHGMATFPQWHRLYTVQVQDALRRHGSLVGIPYWDWTKPVNELPELLSSATFYHPIRNINI 
SNPFLGADIEFEGPGVHTERHINTERLFHSGDHDGYHNWFFETVLFALEQEDYCDFEIQFEIAHNG 
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IHTWIGGSAVYGMGHLHYASYDPIFYIHHSQTDRIWAIWQELQKYRGLSGSEANCAIEHMRTPLKP 
FSFGPPYNLNSHTQEYSKPEDTFDYKKFGYRYDSLELEGRSISRIDELIQQRQEKDRTFAGFLLKG 
FGTSASVSLQVCRVDHTCKDAGYFTILGGSAEMPWAFDRLYKYDITKTLHDIVINLRHEDTFSIDVTI 
T S YNGT VL SGDLIQTPSII F V PGR 

DOMAIN F 

HKLNSRKHTPNRVRHELSSLSSRDIASLKAALTSLQHDNGTDGYQAIAAFHGVPAQCHEPSGREIA 
CC I HGMATF PHWHRL YTLQLEQALRRHG S S VAVP YWDWTKP I TEL PH I LTDGEYYDVWQNAVLANP 
FARGWKIKDAFTVRNVQESLFKMSSFGKHSL^ 

RQTYAFSSLEYSSYDPIFFIHHSFVDKIWAWQELQSRRHLQFRTADCAVGLMGQAMRPFNKDFNH 
NSFTKKHAVPNTVFDYEDLGYNYDNLEISGLNLNEIEALIAKRKSHARVFAGFLLFGLGTSADIHL 
EICKTSENCHDAGVIFILGGSAEMHWAYNRLYKYDITEALQEFDINPEDVFHADEPFFLRLSWAV 
NGTV I P S S HLHQ PT 1 1 YE PGE 

DOMAIN G 

DHHDDHQSGSIAGSGVRKDVNTLTKAETDNLREALWGVMADHGPNGFQAIAAFHGKPALCPMPDGH 
NYSCCTHGMATFPHWHRLYTKQMEDAMRAHGSHVGLPYWDWTAAFTHLPTLVTDTDNNPFQHGHID 
YLNVSTTRSPRDMLFNDPEHGSESFFYRQVLLALEQTDFCKFEVQFEITHNAIHSWTGGHSPYGMS 
TLDFTAYDPLFWLHHSNTDRIWAVWQALQEYRGLPYNHANCEIQAMKTPLRPFSDDINHNPVTKAN 
AKPLDVFEYNRLSFQYDNLI FHGYS I PELDRVLEERKEEDRI FAAFLLSGI KRSADWFDI CQPEH 
ECVFAGTFAILGGELEMPWSFDRLFRYDITKVMKQLHLRHDSDFTFRVKIVGTDDHELPSDSVKAP 

TIEFEPG 
DOMAIN H 

VHRGGNHEDEHHDDRLADVL I RKE VDFLS LQEANAI KDAL YKLQNDDS KGGFEAIAGYHGYPNMCP 

ERGTDKYPCCVHGMPVFPHWHRLHTIQMERALKNHGSPMGIPYWDWTKKMSSLPSFFGDSSNNNP^ 

YKYYIRGVQHETTRDINQRLFNQTKFGEFDYLYYLTLQVLEENSYCDFEVQYEILHNAVHSWLGGT 

GKYSMSTLEHSAFDPVFMIHHSSLDRIWILWQKLQKIRMKPYYALDCAGDRLMKDPLHPFNYETVN 

EDEFTRINSFPSILFDHYRFNYEYDNMRIRGQDIHELEEVIQELRNKDRIFAGFVLSGLRISATVK 

VFIHSKNDTSHEEYAGEFAVLGGEKEMPWAYERMLKLDISDAVHKLHVKDEDIRFRVVVTA 

VTTRLSQPFIVHRPAHVAHDILVIPVGAGHDLPPKVWKSGTKVFIFTPIDSSWKAMVEL^ 

AKCIVPPFSYHGFELDKVYSVDHGDYYIAAGTHALCEQNLRLHIHVEHE 
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Genomic sequence of the HtH2 gene 



DOMAIN 2A-1 (1st part of domain a) 
[domain a, parts 1-4: SEQ ID NO: 156] 

GGTCTTCCGTACTGGGACTGGACGCAGCATCTGACTCAACTCCCAGATCTGGTGTCAGACCCCTTG 
TTTGTCGACCCGGAAGGAGGAAAG 

INTRON 2A-1/2A-2 (SEQ ID NO: 125) 

GTAAGGGATCTCAGATCCGTCAGAGTGAGTGAGTGAGTGAGTGAGTGCCCAGCAACTGAAGCTAGG 
CCGCCCTACTGGGGATCACAGGGAATGTATGTCAATGGTTGAAGAAAGGAGCAGTGGGTTACAACG 
CCGCGTTCAAAGTCATGGCAGTTTCATAGCGCATTGTGCGCGCGTGTGTATCTGTGTGCGCGCGTG 
TGTGCTTGCGTGCGTGTGAGTGAGTCCGCTTGTGCATTTGTACTAGCACAGACTAATGCTGGTTCT 
AGAGAGCCTACTGATAAATGTTTACATTAAGATCTTTACAGTATACTGAGATTCGAGCCCAGACCA 
GCGGAACACCAGGCAGGGTAACAACAAATAACGCCTTTCCACACAACCGACGCAGCCTAAAGTGGC 
TCTGATAGGCTGATACCGGTGTATTCTTAGAACTTGTAATTTGTGCTTTGCCATAATACATGTACT 
TCAGTTAACTGTAATACAGCATAAGACTGGACCGGTGTTTACGACGCAATGAGCAATAATTACTCT 
ACGAAAAGATTTGGTTAGACATATTCAATAATTGTAACATTCATTAACAATGAACACCACGTGCAC 
TCTCGTTTGTGTCAACGTATTCATAATCATTCTCATGCATCTGTTAGCTCAGATATTTTGATGTTT 
CAAGAGATTTGTACGAACGTATGGGCTGGTGCCCCATGAAATTACATACAATGAATTCAGGTGAAA 
TACCTGGCGAGACAATAAGATCTTACTAGTGCTGCCACTTCAGTATGGTGTCCCCGATGGTGTCTG 
GTGTATGGGTGTGTTTGGCGTCAGTTGTTACTGGAAAAGTCAGCTCTAATTATGTCTTTATGTGGT 
TAAAGACCCCATAACCTAGATGTCTGGGTTTAACTTAACATGATAGTAACAGTCGGCTGTATAGCC 
TGACGCTTAAACGTTAGATGAATAAGGACTATATTGTGTTGTATAACATTTCTATAACCTCCTTTC 
TATATCATTTAG 

DOMAIN 2A-2 (2nd part of domain a) 

GCCCATGACAACGCATGGTATCGTGGAAACATCAAGTTTGAGAATAAGAAGACTGCAAGAGCTGTT 
GACGATCGCCTTTTCGAGAAGGTTGGACCAGGAGAGAATACCCGACTCTTTGAAGGAATTCTCGAT 
GCTCTTGAACAGGATGAATTCTGCAACTTCGAGATCCAGTTTGAGTTGGCTCACAACGCTATCCAC 
TACCTGGTTGGCGGCCGTCACAC 

INTRON 2A-2/2A-3 (SEQ ID NO: 126) 

GTGAGTCACGTTCTCTGATGGTCACGAGTCACGTTCTCTGATGGTCACGAGTCACGTTCTCTGATG 
GTCACGAGTCACGTTCTCTGATGGTCACGAGTCACATTCTCTGATGGTCACGAGTCACATTCTCTG 
TTGAGTGAAGTCTCAGTACCATTTATTTCTCTTACCTTCTTCTAACCAGGGGTTTCAGCGTGGATC 
GTCTGAGAAGTTAGCGCAAATCTATATTGAAGTCATTTTTCTATCATATAACCATCGTTATATCCA 
CGTGCGAAAGTGTTCATTAATTATTTTTATTTTCATTTATGAAGGTCTAAAAGAAAATATGTATTG 
TTGGAAACTATATTCGAAGGTGAAGGCAACACGAGTGTATTAATATTCTCAATATCAATGTACGCT 
CTGTCAGCACCTGTTTCACCAGGAACTACACCTTTAGCGTACCAAAATATCAGCTGATGATTTCGA 
AGCGGACTATACCCTCACCACTTGTTTTGTGTGTGTATTTATGTGTGCATGTGTGTGCGTGCGTGC 
GTGTGTGTGTGTGTCCTACGTATGTTGATATTTTGTTCTGACTGTATATGTTCGTGCTTACCATTG 
AAG 
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Fig. 6b 



DOMAIN 2 A- 3 (3rd part of domain a) 

GTACTCCATGTCTCATCTCGAGTACACCTCCTACGACCCCCTCTTCTTCCTCCATCACTCCAACAC 
CGACCGCATCTTCGCCATCTGGCAACGTCTTCAGGTACTCAGAGGAAAGGACCCCAACACCGCCGA 
CTGCGCACACAACCTCATCCATGAGCCCATGGAACCGTTCCGTCGGGACTCGAACCCTCTTGACCT 
CACCAGGGAAAACTCCAAACCAATTGACAGCTTTGATTATGCCCACCTTGGCTACCA 

INTRON 2A-3/2A-4 (SEQ ID NO: 127) 

GTATGTATGATTCTAATAATGAATGTTTTTACCTCCGGTTTAAACAATATTTTAGTATTACGAAAG 
GAGAAGTACCTCGAGAGGTCTAGGTCTCAGATGTTTAGAAACCCATGAAGACAGGTATGCTTCTGA 
AAAACAAAGTAACATCATGAGGCTAAAGTTCAGATTCAAACCATCGTAGTTCGAATCCAGCATGCA 
AAGGGCCCTAACCCTGTAGATGGCGCTGCTTGAAACAGAGTAGTCTGTTCAGGGTCAGTACTGTCC 
CCACAAACATCATAGTCAGGGTCAGTACTGTCCCCACAAACATCATAGTCAGGGTCAGTACTGTCC 
CCACAAACATCACAGTCAGGGTTAATTTTGGATTCGGTTTCGAATGCGAAGAAGACAGTCACGCCC 
TGACACTGGACCGAGGTTGCCGAGAAAGCTCGTGATATTGCTGGAATACTGCCCAGTAAAACCATC 
ATTTATTTTAGGCTATTTATTACGAAAAATAATAATATGTATAGAAATGCATATGATCGCTGTTTG 
AATGTAAAATTTAGAATGGGTTTGGGAGTGTTCACTATTTTTTCATCAAAATTTCATGTATTTTAA 
CCGATCGACGCTGAAGACAAACTACCGTTAATCAGGCAGTTCATTCATATCTGATAGGGAATATTG 
GTTGTTAACCAACGCTACATTGTGTCCAG 

DOMAIN 2A-4 (4th part of domain a) 

GTATGATGACTTGACCCTGAACGGTATGACCCCAGAGGAATTGAACTCATATCTGCATGAACGGTC 
AGGCAAGGAGGGGGTGTTCGCAAGCTTCCGACTCTCAGGTTTTGGCGGCTCTGCTAACGTTGTTGT 
CTACGCATGCCGTCCTGCCCACGATGAAATGGCTGTCGATCAGTGCGACAAAGCCGGCGACTTCTT 
TGTGTTGGGCGGACCCACCGAGATGCCCTGGAGGTTTTACAGAGCATTCCACTTCGACGTCACCGA 
CAGCATCGACAACATCGACAAGGACCGCCACGGCCACTATTATGTAAAGGCGGAATTATTCAGTGT 
AAATGGAAGTGCGCTACCGAATGATCTCCTGCCTCAACCCACCATCTCACACAGGCCAGCCCGCGG 
ACACGTTGATG 

INTRON 2A-4/2B (SEQ ID NO: 128) 

GTAAATGGCCATTGTATACATGCATTCATTTGGACTTTGAGTGAGTGAGTGGATGCGTATTCAGTA 
AGTGAGAGTGTGAGTGGGTATTAGGTCTGTGAGTGGGTTGGTGAGTGGATGGGTGAGTAAGAGTGG 
GTTGGTGAGAAAGTGAGTGAGTCACTTGGTGGGTGCGTTAGTGGAAGCGTGATTGAGTGGATGGGA 
GGTAGGTGAGTGAGTGAATTGGTGGGGGGGTGAGTGAGGTTAACGCTGTTCTGCTGTTCAATCACA 
CCACATGTTGCCAGCTTACTGTGCAGGACGAATCCAGGGTTGTGTTAAATTTTATATGTTTATATA 
TAACGATGGACGTGTCTGGATGTGGCGAATGTGTCAAGAGAATTATGCGGCTTTGTGCTGCTCCGC 
GTATTTATTGCACGCGCGTTGGTACGCGGTTGATAAAGTAGTTCAAAACATTTCCCAGCCATCTTT 
GTCTGTTGTGAAAACCTACTCCAGGACCATCCATTTCAATATGTGTCTGCGTTCATGGAGTTATAC 
ATGTTAAACTGTAGAGCGCAGATGAGCACACTTGAGCATTTCTTCAGTAAATCAGAATGTGTATAT 
TTCAAAATTTACCAAATGCAATATCATCAAGCAAATTATGCAGCTCTATAGTAACATCGGAGTCAA 
TGGTCCAGTGTGCCCTCGGCTGCCATTCCGACCTCCCTGGCCAGAATACACCCCGGTCAGGATCAG 
TTATCCGTCAGAAGGCACGGTGCGGAATGAAAACATAAACACATAGTCGCTTAGTAGTATGCTGAT 
TTAGGCACGCAAAATCCGAATGTGAATTACTGTGAATTGCATTACCTGTTACAG 
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DOMAIN 2B 

AGGCCCCAGCTCCCTCCTCGGATGCTCACCTCGCCGTCAGGAAGGATATCAACCATCTGACACGCG 
AGGAGGTGTACGAGCTGCGCAGAGCTATGGAGAGATTCCAGGCCGACACATCCGTTGATGGGTACC 
AGGCTACGGTTGAGTATCACGGCTTACCTGCTCGATGTCCATTCCCCGAGGCCACAAATAGGTTCG 
CCTGTTGCATCCACGGCATGGCGACATTCCCTCATTGGCACAGACTGTTCGTTACCCAGGTGGAAG 
ATGCACTGATCAGGCGAGGATCCCCTATAGGGGTCCCCTACTGGGACTGGACTCAGCCTATGGCAC 
ATCTCCCAGGACTTGCAGACAACGCCACCTATAGAGATCCCATCAGCGGAGACAGCAGACACAACC 
CGTTCCACGATGTTGAAGTTGCGTTTGAAAATGGGCGTACAGAACGTCACCCAGATAGTAGATTGT 
TTGAACAACCTCTATTTGGCAAACATACGCGTCTCTTCGACAGTATAGTCTATGCTTTTGAGCAGG 
AGGACTTCTGCGATTTTGAAGTTCAATTTGAGATGACCCATAATAATATTCACGCCTGGATTGGTG 
GCGGCGGGAAGTATTCCATGTCTTCTCTACACTACACAGCCTTCGACCCTATCTCCTACCTTCATC 
ACTCCAACACTGACCGTCTCTGGGCAATTTGGCAAGCGTTGCAGATACGAAGAAACAAACCGTATA 
AGGCTCATTGTGCTTGGTCTGAGGAACGCCAGCCTCTCAAACCTTTCGCCTTCAGTTCCCCACTGA 
ACAACAACGAAAAAACCTACGAAAACTCGGTGCCCACCAACGTTTACGACTACGAAGGAGTCCTTG 
GCTATACTTATGATGACCTCAACTTCGGGGGCATGGACCTGGGTCAGCTTGAGGAATACATCCAGA 
GGCAGAGACAGAGAGACAGGACCTTTGCTGGCTTCTTTCTGTCACATATTGGTACATCAGCGAATG 
TTGAAATCATTATAGACCATGGGACTCTTCATACCTCCGTGGGCACGTTTGCTGTTCTTGGCGGAG 
AGAAGGAGATGAAATGGGGATTTGACCGTTTGTACAAATATGAGATTACAGATGAACTGAGGCAAC 
TTAATCTCCGTGCTGATGATGGTTTCAGCATCTCTGTTAAAGTAACTGATGTTGATGGCAGTGAGC 
TGTCCTCTGAACTCATCCCATCTGCTGCTATCATCTTCGAACGAAGCCATA 

INTRON 2B/2C (SEQ ID NO: 129) 

GTAAGTAGCTACCTGTTTATTCAATTTTTTCGCTTTGCCAATCAATTCATTCAGCTTGAAATTCAA 
TAATTGTGTTTTGCATGGCTGAAAACCAATTTGAACTCTTTTCTTTTCTCAGGTCGAACTCAAATA 
AATAATCACTAATTGTTATGCACGCGGGTAGGGCATACATACTATATCCACATCGGTCATCTCAAA 
ATGCAAACAAATTGTCTTATTTCCGTTGGGACAAGCAAACCCCCTTTCCTGTAATCTTGCCTTTGG 
CATCCACTGGAATTAATGTTGACTGGTAATTGATACTGGCTCTCTTCTTGCATAGAGTTAATATCT 
ATAGTTTGTAAATCTTTATGATTTTGCTATTTATATTTCGACAGCATGCTATAGACACCCTAGACT 
ATTGTATAGCCACTTGTATTGTTTTTCCATTTATTATTTATAACAGAACATGGCTTGTAATTTTTA 
TTTACCTTCCAG 

DOMAIN 2C 

TTGACCATCAGGACCCTCATCAGGACACAATCATCAGGAAAAATGTTGATAATCTTACACCCGAGG 
AAATTAATTCTCTGAGGAGGGCAATGGCAGACCTTCAATCAGACAAAACCGCCGGTGGATTCCAGC 
AAATTGCTGCTTTTCACGGGGAACCCAAATGGTGCCCAAGTCCCGATGCTGAGAAGAAGTTCTCCT 
GCTGTGTCCATGGAATGGCTGTCTTCCCTCACTGGCACAGACTCCTGACCGTGCAAGGCGAGAATG 
CCCTGAGAAAGCATGGATGTCTCGGAGCTCTCCCCTACTGGGACTGGACTCGGCCCCTGTCTCACC 
TACCTGATTTGGTAAGTCAGCAGAACTACACCGATGCCATATCCACCGTGGAAGCCCGAAACCCCT 
GGTACAGCGGCCATATTGATACAGTTGGTGTTGACACAACAAGAAGCGTCCGTCAAGAACTGTATG 
AAGCTCCCGGATTTGGTCATTATACTGGGGTCGCTAAGCAAGTGCTTCTGGCTTTGGAGCAGGATG 
ACTTCTGTGATTTTGAAGTCCAGTTTGAGATAGCTCACAATTTCATCCACGCTCTTGTCGGCGGAA 
GCGAGCCATATGGTATGGCGTCACTCCGTTACACTACTTATGATCCAATTTTCTACCTCCATCATT 
CTAACACTGACAGACTCTGGGCTATATGGCAGGCTCTACAAAAGTACAGGGGCAAACCTTACAATT 
CCGCCAACTGTGCCATTGCTTCTATGAGAAAACCCCTACAGCCCTTTGGTCTGACTGATGAGATCA 
ACCCGGATGATGAGACAAGACAGCATGCTGTTCCTTTCAGTGTCTTTGATTACAAGAACAACTTCA 
ATTATGAATATGACACCCTTGACTTCAACGGACTATCAATCTCCCAGCTGGACCGTGAACTGTCAC 
GGAGAAAGTCTCATGACAGAGTATTTGCCGGATTTTTGCTGCATGGTATTCAGCAGTCTGCACTAG 
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TTAAATTCTTTGTCTGCAAATCAGATGATGACTGTGACCACTATGCTGGTGAATTCTACATCCTTG 
GTGATGAAGCTGAAATGCCATGGGGCTATGATCGTCTTTACAAATATGAGATCACTGAGCAGCTCA 
ATGCCCTGGATCTACACATCGGAGATAGATTCTTCATCAGATACGAAGCGTTTGATCTTCATGGTA 
CAAGTCTTGGAAGCAACATCTTCCCCAAACCTTCTGTCATACATGACGAAGGGGCAG 

INTRON 2C/2D (SEQ ID NO: 130) 

GTGAGAACATTGATAATAGTTCAAATGAAGTATATCCGATTCAAGCTGTCGATACAAGATGAGATA 
CATAATCACAATGTTTGTATTAGATATCTCTCTTAATTTAATGCCGCTTTTATCAATATTCGAGCA 
ATCCTTCAGCAACATACACCAGCAAATGTTTCATCAACAGACTATATTATTTAATATTTTAAAAAT 
CCTTCTCTGTTGTTATAAATACTTAAAGTATCGAATTCCTTGAATGCGTCTTCTCTGCAGCATATA 
GTTAAGTTGTTGTGTTTCTCTGTCAG 



DOMAIN 2D 

GTCACCATCAGGCTGACGAGTACGACGAAGTTGTAACTGCTGCAAGCCACATCAGAAAGAATTTAA 
AAGATCTGTCAAAGGGAGAAGTAGAGAGCCTAAGGTCTGCCTTCCTGCAACTTCAGAAGGACGGAG 
TCTATGAGAATATTGCCAAATTCCACGGCAAGCCTGGGTTGTGTGATGATAACGGTCGCAAGGTTG 
CCTGTTGTGTCCATGGAATGCCCACCTTCCCCCAGTGGCACAGACTCTATGTCCTCCAGGTGGAGA 
ATGCTTTGCTGGAGAGAGGATCTGCCGTCTCTGTGCCATACTGGGACTGGACTGAAACATTTACAG 
AGCTGCCATCTTTGATTGCTGAGGCTACCTATTTCAATTCCCGTCAACAAACGTTTGACCCTAATC 
CTTTCTTCAGAGGTAAAATCAGTTTTGAGAATGCTGTTACAACACGTGATCCCCAGCCTGAGCTGT 
ACGTTAACAGGTACTACTACCAAAACGTCATGTTGGCTTTTGAACAGGACAACTACTGCGACTTCG 
AGATACAGTTTGAGATGGTTCACAATGTTCTCCATGCTTGGCTTGGTGGAAGAGCTACTTATTCTA 
TTTCTTCTCTTGATTATTCTGCATTCGACCCTGTGTTTTTCCTTCACCATGCGAACACAGATAGAT 
TGTGGGCCATCTGGCAGGAGCTGCAGAGGTACAGGAAGAAGCCATACAATGAAGCGGATTGTGCCA 
TTAACCTAATGCGCAAACCTCTACATCCCTTCGACAACAGTGATCTCAATCATGATCCTGTAACCT 
TTAAATACTCAAAACCCACTGATGGCTTTGACTACCAGAACAACTTTGGATACAAGTATGACAACC 
TTGAGTTCAATCATTTCAGTATTCCCAGGCTTGAAGAAATCATTCGTATTAGACAACGTCAAGATC 
GTGTGTTTGCAGGATTCCTCCTTCACAACATTGGGACATCCGCAACTGTTGAGATATTCGTCTGTG 
TCCCTACCACCAGCGGTGAGCAAAACTGTGAAAACAAAGCCGGAACATTTGCCGTACTCGGAGGAG 
AAACAGAGATGGCGTTTCATTTTGACAGACTCTACAGGTTTGACATCAGTGAAACACTGAGGGACC 
TCGGCATACAGCTGGACAGCCATGACTTTGACCTCAGCATCAAGATTCAAGGAGTAAATGGATCCT 
ACCTTGATCCACACATCCTGCCAGAGCCATCCTTGATTTTTGTGCCTGGTTCAA 

INTRON 2D/2E (SEQ ID NO: 131) 

GTAAGAAAGTTTCACTGTCTAAATCTTTTTTTATGATAGAGGGTAGAGAAGTGGAGACAATGTGAC 
AATATATTGAATAAAGTTGTTTAAAATTTATAACTCTCATAAGTTCATATTATGCTGAAGCTGTAG 
CCATCTATAACTGTGTAACATGAAATGTTAAGACATTAACCTAAATACTTCAGCTGATAACAAAAC 
AATGTTAATACATACGTCAATGTAACATTTTCTTATCTTTAGGTTATAGCATAAACACTTCAGAGA 
TACAGTGACGAAAACCTCTATTTAAATATTTCAG 

DOMAIN 2E 

GTTCTTTCCTGCGTCCTGATGGGCATTCAGATGACATCCTTGTGAGAAAAGAAGTGAACAGCCTGA 
CAACCAGGGAGACTGCATCTCTGATCCATGCTCTGAAAAGTATGCAGGAAGACCATTCACCTGATG 
GGTTCCAAGCCATTGCCTCTTTCCATGCCCTGCCACCACTCTGCCCTTCACCATCTGCAACTCACC 
GTTATGCTTGCTGTGTCCACGGCATGGCTACATTTCCCCAGTGGCACAGACTGTACACTGTACAGT 
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TCCAGGATGCACTGAGGAGACATGGAGCTGCAGTAGGTGTACCGTATTGGGATTGGCTGCGACCGC 
AGTCTCACCTACCAGAGCTTGTCACCATGGAGACATACCATGATATTTGGAGTAACAGAGATTTCC 
CCAATCCTTTCTACCAAGCCAATATTGAGTTTGAAGGAGAAAACATTACAACAGAGAGAGAAGTCA 
TTGCAGACAAACTTTTTGTCAAAGGTGGACACGTTTTTGATAACTGGTTCTTCAAACAAGCCATCC 
TAGCGCTTGAGCAGGAAAACTACTGTGACTTTGAGATTCAGTTTGAAATTCTTCACAACGGCGTTC 
ACACGTGGGTCGGAGGCAGTCGTACCCACTCTATCGGACATCTCCATTACGCATCCTACGACCCTC 
TTTTCTACCTCCACCATTCCCAGACAGACCGTATTTGGGCAATCTGGCAAGAACTCCAGGAACAGA 
GAGGGCTCTCAGGTGATGAGGCTCACTGTGCTCTCGAGCAAATGAGAGAACCATTGAAGCCTTTCA 
GCTTCGGCGCTCCTTATAACTTGAATCAGCTAACACAGGATTTCTCCCGACCCGAGGACACCTTCG 
ACTACAGGAAGTTTGGTTATGAATATGACAATTTAGAATTCCTAGGAATGTCAGTTGCTGAACTGG 
ATCAATACATTATTGAACATCAAGAAAATGATAGAGTATTCGCTGGGTTCCTGTTGAGTGGATTCG 
GAGGTTCCGCATCAGTTAATTTCCAGGTTTGTAGAGCTGATTCCACATGTCAGGATGCTGGGTACT 
TCACCGTTCTTGGTGGCAGTGCTGAGATGGCGTGGGCATTTGACAGGCTATACAAATATGACATTA 
CTGAAACTCTGGAGAAAATGCACCTTCGATATGATGATGACTTCACAATCTCTGTCAGTCTGACCG 
CCAACAACGGAACTGTCCTGAGCAGCAGTCTAATCCCAACACCGAGTGTCATATTCCAGCGGGGAC 
ATC 



INTRON 2E/2F-1 (SEQ ID NO: 132) 

GTAAGTAGTAAACTGCTCAGATTGTTTTCATAATTACTCCACTATTAAGTAAAAAGTACTAGTAAT 
TCAATAGTACTGTTCACAGAGAAATGTAACACAATAGACCACAGAGTCCATTTGTTAAACGCCTTT 
GGCTTGGTAAGTCTGAGATTTTGGTGACTGATGGAAAGCTAAAATATATTTTGACAG 

DOMAIN 2F-1 (1st part of domain f) 

GTGACATAAATACCAAGAGCATGTCAGCGAACCGTGTTCGCCGTGAGCTGAGCGATCTGTCTGCGA 
GGGACCCGTCTAGTCTCAAGTCTGCTCTGCGAGACCTACAGGAGGATGATGGCCCCAACGGATACC 
AGGCTCTTGCAGCCTTCCATGGGCTACCAGCAGGCTGCCATGATAGCCAGGGAAATGAG 

INTRON 2F-1/2F-2 (SEQ ID NO: 133) 

GTATATTTAAGTATTTTATCTTACGCATGACCCTGACCCTATTTATTTTTTTTTAATCCTCGGATT 
TGTTTAATCCTGTTACCAGCGAAGGTCCGGGTTAGAATTGATCTTCAGTCAACTATTCTTGTCGTA 
GGACTAACGAGTTGTCTGGCTTGCTTACTCGGTTGACACGTGTCAACGGATCCCAATTGCAATTAG 
ATCGATGCTCATGCTGTTGATCCCTGGATTGCCTGGTCCGGACTCCACATACCGCCGCCATATTGC 
TGGTATATTGTCGAATGCGACGCTAAACAGCAAGCCAACCAACAATACTGAGACCTGGTGGTACAT 
GTCAGTTCTCTATTGCTGGGGTTCCAAACATAGCCATCAGTTGAAATATTTCATACATAGAAGAAT 
ACCTCTGAATATGATGATGAAACATTTACTTAGACTTGCCTGTGAGCCCCAGGCAAAATGCACTGT 
AAAAATACACTGACAGAGGATTAGGCATTCTTGGGAGTACTGTATAGTTAGTTGCATACATATTAG 
CGTTCCCTCACTAAAACGAATCTCTGAATGCTATCAATTAAAGATCATGATGCTTTGATTGTGTCT 
ACTGTATTTAAAATGGTGTTAAGATTTGCAATTACAATATACACAAACACGTTTCCTGCATCTCGG 
AGAATGCAATCTTTCGTTGTACGCGTCTGTTTTCATATTTTTATGCATGTAGTTTGCACTACTTAG 
CGTCCAATAAATCCATTCACAAAATCACACAAACAAACGATTTTAGGAATGTGACTGTAGCTGCAA 
CGAATATACCTGATCCTTTCTTGTTCCAG 

DOMAIN 2F-2 (2nd part of domain f) 

ATCGCATGTTGCATTCACGGTATGCCGACCTTCCCCCAGTGGCACAGACTGTACACCCTGCAGTTG 
GAGATGGCTCTGAGGAGACATGGATCATCTGTCGCCATCCCCTACTGGGACTGGACAAAGCCTATC 
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Fig. 6f 



TCCGAACTCCCCTCGCTCTTCACCAGCCCTGAGTATTATGACCCATGGCATGATGCTGTGGTAAAC 
AACCCATTCTCCAAAGGTTTTGTCAAATTTGCAAATACCTACACAGTAAGAGACCCACAGGAGATG 
CTGTTCCAGCTTTGTGAACATGGAGAGTCAATCCTCTATGAGCAAACTCTTCTTGCTCTAGAGCAA 
ACCGACTACTGTGATTTTGAGGTACAGTTTGAGGTCCTCCATAACGTGATCCACTACCTTGTTGGC 
GGACGTCAGACCTACGCATTGTCTTCTCTGCATTATGCATCCTACGACCCATTCTTCTTTATACAC 
CATTCCTTTGTGGATAAGATGTGGGTAGTATGGCAAGCTCTTCAAAAGAGGAGGAAACTTCCATAC 
AAGCGAGCTGACTGTGCTGTCAACCTAATGACTAAACCAATGAGGCCATTTGACTCCGATATGAAT 
CAGAACCCATTCACAAAGATGCACGCAGTTCCCAACACACTCTATGACTACGAGACACTGTACTAC 
AGCTACGATAATCTCGAAATAGGTGGCAGGAATCTCGACCAGCTTCAGGCTGAAATTGACAGAAGC 
AGAAGCCACGATCGCGTTTTTGCTGGATTCTTGCTTCGTGGAATCGGAACTTCTGCTGATGTCAGG 
TTTTGGATTTGTAGAAATGAAAATGACTGCCACAGGGGTGGAATAATTTTCATCTTAGGTGGAGCC 
AAGGAAATGCCATGGTCATTTGACAGAAACTTCAAGTTTGATATCACCCATGTACTCGAGAAAGCT 
GGCATTAGCCCAGAGGACGTGTTTGATGCTGAGGAGCCATTTTATATCAAGGTTGAGATCCATGCT 
GTTAACAAGACCATGATACCATCGTCTGTGATCCCAGCCCCAACTATCATCTATTCTCCTGGGGAA 
G 

INTRON 2F-2/2G-1 (SEQ ID NO : 134 ) 

GTGAGAGAACCAGTAATAGCTACTGTCTACAAAGAATGTGTTCATTTAAAGACCTGACTGTAGGCC 
GATGGCTGCTGTCATCTCCTCCGCCTCCTCCTCCTGTTCCTCCTCCGAAGGGGTCAGCTTCAGGTT 
CTCTTGCCAATATGCCAAGCAGACCTCCTGAGCAGGCAGTATATATACGTAAGGGAAGCAAGTATG 
GACCATCGCGCGGCATGTAGAGATACAATGATCAGCTGTCTGCTGTTCCACTCCTGTCAGACAATG 
AGATAAACATGAATACAGTATTACTCAGCAGCGTTCCAATTTTCAACCCTCGTATTTATTAAAAAA 
AGGAATTTTTAATATATTTTTCTCCTTGTTGAAATATTTTAGTAACTGTTAATCGATATAGAGTGG 
AGTAGTGACGCTTTATTTCGGTTCATTCTCGAAACAAAAATATAATAGTCCACTGAACTCTCTTAA 
ATTGTTTTTACAACCTTCAACTGCCACAGACGTAATCCCTCACGTTATTTTGAGCTGACAACGTGT 
TGAATTGAGTGTGTTCCGAATTCTAAATAAGCATGTATATATTTACGTCTCATGCAAGTAATATAT 
GTTTAACTGATGACGTCACTTGGTGACCACTGATTTAGTTCCTTTGTCATAATTGCAGTTTCTGTT 
GTCACGGGGACGGTGGGGAAGCCAGGTTCCTCCTGTCACGCTGAATATCCCGTTCGAATCCCCCAC 
ATGGGTACAAAGTGTGATGCCTATTTCTGGTGTCCCCCACCGTGATATTGCTGGAATAAGTGGCTT 
AATACCATATACACTCACTCTATTGTCACACTACTGCCACCGGCTCACACCTCTGATGCTTCTGTT 
CTATCCAG 

DOMAIN 2G-1 (1st part of domain g) 

GTCGCGCTGCTGACAGTGCACACTCAGCCAACATTGCTGGCTCTGGGGTGAGGAAGGACGTCACGA 
CCCTCACTGTGTCTGAGACCGAGAACCTAAGACAGGCTCTTCAAGGTGTCATCGATGATACTGGTC 
CCAATGGTTACCAAGCAATAGCATCCTTCCACGGAAGTCCTCCAATGTGCGAGATGAACGGCCGCA 
AGGTTGCCTGTTGTGCTCACG 

INTRON 2G-1/2G-2 (SEQ ID NO: 135) 

GTAATTAATGGATGTGAAGTCAATGTCCGAGGGTATAATAAGGATTTAAATACTTCAGTCGTGTAA 
TACTGTATGACATGTGTATTGGATGGTGTAGGTATTACAGGTTATAAGGCCAGTGTGTGTTGGGAC 
GGTTACTTTCCTGCACTAGTAATAAGCATTGTATTTAGCTAGCTTTTATCATATAACTTTAGTTTC 
ATGGTTTGTGGCAATTGAAATCGAAATTTTCTTTCATTTCAAGGTTATCGCACTCGTGTGTTAGAA 
TAGTTACTATGCTGCATTGAGAATAACACTATAGTAATAAAGCATATCATACAGTAAGAATAACAC 
TATAGTAATAAAGTATATCATACAGTAAGAATGTCATTGTATGATAAATAGGTTATCACACTCGTG 
TGTTTTAGAATGGTTACTATCCCAGGAATAACCACTATGTATTACATGTATATTGGGCAGTGTAAG 
TAGTAGCATTGTATATTAAATCAGTATATCGTGCTTCAAAACACCAGGATATATGGGGTATACAGT 
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GGGCAGTGTAAGTAGCAACATTGTATATTAAATCAGTATATCGTACTTCAAAACACCAGGATTATG 
GGGTATACAGTGGGCAGTGTAAGTAGTAGCATTGTATATTAAATCAGTATATCGTACTTCAAAACA 
CCAGGATATAATTCAGTATATCGTGCTTCAAAACACCAGGATATAATTCAGTATATCGTGCTTCAA 
AACACCAGGATATATGGGATATACAGTGCGGGTTTGCATACAACCTCCACCCTTTACAG 

DOMAIN 2G-2 (2nd part of domain g) 

GTATGGCCTCCTTCCCACACTGGCACAGACTGTATGTGAAGCAGATGGAAGACGCCCTGGCTGACC 
ACGGATCACATATCGGCATCCCTTACTGGGACTGGACAACTGCCTTCACAGAGTTACCCGCCCTTG 
TCACAGACTCCGAGAACAATCCCTTCCATGAG 

INTRON 2G-2/2G-3 (SEQ ID NO: 13 6) 

GTCAGTTTAGTCTCCTGTCTGAGCTAACGATACCAATTTCCTATTTTCGAGAACCACGATGACGAG 
AAAACAAGCAATATAGATATAGATGCAGTATAGATCAAGTTAATGAATTCATTGCTATATGTTTGC 
TTGTAATAAACTTTAAGAAAACGAGAGCATGCACACAAATGAAACAAACAATTATGTGTTTGATAG 
GAATATGATATATGTATTTGGGGGCTGACGTGAGCAGGGTTGAAGGGACAGTTTACATTGTCAGTA 
ACACTGGGAGTATTCTTTGATCCACAATATATAGTTTCATTGTGTTCAGCAGTTACAACTAACATT 
ATATCATACATTACGTCGTAACATGCTTCTTTTGTCCTCTTCTGCCAG 

DOMAIN G-3 (3rd part of domain g) 

GGTCGCATTGATCATCTCGGTGTAACCACGTCACGTTCCCCCAGAGACATGCTGTTTAACGACCCA 
GAGCAAGGATCAGAGTCGTTCTTCTATAGACAAGTCCTCCTGGCTTTGGAGCAGACTGACTACTGC 
CAGTTCGAAGTCCAGTTTGAGCTGACCCACAACGCCATTCACTCCTGGACAGGTGGACGTAGCCCT 
TACGGAATGTCGACCCTCGAGTTCACAGCCTACGATCCTCTCTTCTGGCTTCACCACTCCAACACC 
GACAGAATCTGGGCTGTCTGGCAAGCACTGCAGAAATACCGAGGACTCCCATACAACGAAGCACAC 
TGTGAAATCCAGGTTCTGAAACAGCCCTTGAGGCCATTCAACGATGACATCAACCACAATCCAATC 
ACCAAGACTAATGCCAGGCCTATCGATTCATTTGATTATGAGAGGTTTAACTATCAGTATGACACC 
CTTAGCTTCCATGGTAAGAGCATCCCTGAACTGAATGACCTGCTCGAGGAAAGAAAAAGAGAAGAG 
AGAACATTTGCTGCCTTCCTTCTTCGTGGAATCGGTTGCAGTGCTGATGTCGTCTTTGACATCTGC 
CGCCCCAATGGTGACTGTGTCTTTGCAGGAACCTTTGCTGTGCTGGGAGGGGAGCTAGAAATGCCT 
TGGTCCTTCGACAGACTGTTCCGCTATGACATCACCAGAGTCATGAATCAGCTCCATCTCCAGTAT 
GATTCAGATTTCAGTTTCAGGGTGAAGCTTGTTGCAACCAATGGCACTGAGCTTTCATCAGACCTC 
CTCAAGTCACCAACAATTGAACATGAACTTGGAG 

INTRON 2G-3/2H (SEQ ID NO: 137) 

GTATGTTATCTTATTATCAAATGTGTAATCAGATACTGGAGACGTTTTCATATTAACTTGGTCAGC 
ATTAGTTGATGATTTTGGTGCGATATTGACGACAAGGAGTTAAGCATTAACACGTTCAACACATCT 
TTAATCTGATATGAGAAGGGAATAAATTGATCCAGTATTGATGATTGAAGTTAGATTAACAGTGAA 
AGATATACCAGTTTTGATAATCGTATAAAACAGTAGCAGAATTGTATCGTGAAAACTAAATGTGGG 
AAGGCGAACGCCAAGCAGATTTTAGATTACGATCGTGTGCTAGAATAATTCACAATAACCCAGACG 
TCGGAAATGTGGTTGTCTATGGCAATAGTTACGATTAATTGCTAACATGCACGATTTACCTATTTC 
AG 

DOMAIN 2H 

CCCACAGAGGACCAGTTGAAGAAACAGAAGTCACTCACCAAAATACTGACGGCAATGCACACTTCC 
ATCGTAAGGAAGTTGATTCGCTGTCCCTGGATGAAGCAAACAACTTGAAGAATGCCCTTTACAAGC 
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TACAGAACGACCACAGTCTAACAGGATACGAAGCAATCTCTGGTTACCATGGATACCCGAATCTGT 
GTCCGGAAGAAGGCGATGACAAATACCCCTGCTGCGTCCACGGAATGGCCATCTTCCCCCACTGGC 
ACAGACTCTTGACCATCCAACTGGAAAGAGCTCTCGAGCACAATGGTGCACTGCTTGGTGTTCCTT 
ACTGGGACTGGACCAAGGACCTGTCGTCACTGCCGGCGTTCTTCTCCGACTCCAGCAACAACAATC 
CCTACTTCAAGTACCACATCGCAGGTGTTGGTCACGACACCGTCAGAGAGCCAACTAGTCTTATAT 
ATAACCAGCCCCAAATCCATGGTTATGATTATCTCTATTACCTAGCATTGACCACGCTTGAAGAAA 
ACAATTACTGTGACTTTGAGGTTCAGTATGAGATCCTCCACAACGCCGTCCACTCCTGGCTTGGAG 
GATCCCAGAAGTATTCCATGTCTACCCTGGAGTATTCGGCCTTTGACCCTGTCTTTATGATCCTTC 
ACTCGGGTCTAGACAGACTTTGGATCATCTGGCAAGAACTTCAGAAGATCAGGAGAAAGCCCTACA 
ACTTCGCTAAATGTGCTTATCATATGATGGAAGAGCCACTGGCGCCCTTCAGCTATCCATCTATCA 
ACCAGGACGAGTTCACCCGTGCCAACTCCAAGCCTTCTACAGTTTTTGACAGCCATAAGTTCGGCT 
ACCATTACGATAACCTGAATGTTAGAGGTCACAGCATCCAAGAACTCAACACAATCATCAATGACT 
TGAGAAACACAGACAGAATCTACGCAGGATTTGTTTTGTCAGGCATCGGTACGTCTGCTAGTGTCA 
AGATCTATCTCCGAACAGATGACAATGACGAAGAAGTTGGAACTTTCACTGTCCTGGGAGGAGAGA 
GGGAAATGCCATGGGCCTACGAGCGAGTTTTCAAGTATGACATCACAGAGGTTGCAGATAGACTTA 
AACTAAGTTATGGGGACACCTTTAACTTCCGACTAGAGATCACATCCTACGATGGATCGGTGGTAA 
ACAAGAGCCTACCCAATCCTTTCATCATCTACAGACCTGCCAATCATGACTACGATGTTCTTGTTA 
TCCCAGTAGGAAGAAACCTTCACATCCCTCCCAAAGTTGTCGTCAAGAGAGGCACCCGCATCGAGT 
TCCACCCAGTCGATGATTCAGTTACGAGACCAGTTGTTGATCTTGGAAGCTACACTGCACTCTTCA 
ACTGTGTGGTACCACCGTTCACATACCGCGGATTCGAACTGAACCACGTCTATTCTGTCAAGCCTG 
GTGACTACTATGTTACCGGACCAACGAGAGACCTTTGCCAGAATGCAGATGTCAGGATTCATATCC 
ATGTTGAGGATGAGTAA 

3'UTR 

CGCAACAG 



INTRON 3"UTR (SEQ ID NO: 13 8) 

GTGAGATAAGAAACCCTTCTAACAGTAATACGACACCACATTACAGCTTAAACATGATTGCCATCG 
ATGTTTTCATGTGTAGTATACGCTTTTCAGTTCTACATAATTTTGTTTTTCAAATCAAGTTTAGCA 
AATGAATCTATCACTGGAAAATAGGGTAGGGTAGCCAAGTGGTTAAAGCGGTCACTGATCACGCCA 
AAGACGAGTGTCCTAACCTGCATGGGTACAAAAGTGAAGACCATTGCTGGTGTCTACCGCCGTAAT 
ATTGTTTTTAGTATTGCTAAAACTTATACTCACCCATGCGCTGTAAAAGTGGAATAATAATCATAT 
TTCAACAAAAGCACAAAACCATTTCATTTTCATGAAAGCCTCTTGTTCACCTGAAAGACGCAAGAG 
AACAATAGTTCCTAACATTATTTTCAGACATTGGAAATGTCCTGCACGTGTAAACCATATATCCTT 
TGAAATTTTTACGACTGCATCGTATACAATTTATGATATAAATTTAAAACTTTATTTCAG 

3'UTR 

GTTTCTTGGTCTCCACATATTCACACATCAGCACCAAACGGTTTCGAAGGACATTGGCGTTCTTCT 
CTGGCAATGCATTTCAATACAACATTGAAAATGACTTCAGCATATCAGTGTGCTTCGAACGTGTTC 
CGGAAGTACTCAAATGTGCTATGACTGAATTATTGTACATACATAACTTATTGATGTTCAATAAAT 
AAATGTTGAAACG 
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Primary structure of the HtH2 protein 



DOMAIN A (SEQ ID NO: 156) 

GLPYWDWTQHLTQLPDLVSDPLFVDPEGGKAHDNAWYRGNIKFENKKTARAVDDRLFEKVGPGENT 
RLFEGILDALEQDEFCNFEIQFELAHNAIHYLVGGRHTYSMSHLEYTSYDPLFFLHHSNTDRIFAI 
WQRLQVLRGKDPNTADCAHNLIHEPMEPFRRDSNPLDLTRENSKPIDSFDYAHLGYQYDDLTLNGM 
TPEELNSYLHERSGKEGVFASFRLSGFGGSANVVVYACRPAHDEMAVDQCDKAGDFFVLGGPTEMP 
WRFYRAFHFDVTDS IDNIDKDRHGHYYVKAELFS VNGSALPNDLLPQPTI SHRPARGHVDEAPAPS 
SDAHLAVRKDINHLTREEVYELRRAMERFQADTSVDGYQATVEYHGLPARCPFPEATNRFACCIHG 
MATFPHW 

DOMAIN B 

HRLFVTQVEDAL I RRGS P I GVP YWDWTQPMAHLPGLADNATYRDP I SGDSRHNPFHDVEVAFENGR 
TERHPDSRLFEQPLFGKHTRLFDSIVYAFEQEDFCDFEVQFEMTHNNIHAWIGGGGKYSMSSLHYT 
AFDPISYLHHSNTDRLWAIWQALQIRRNKPYKAHCAWSEERQPLKPFAFSSPLNNNEKTYENSVPT 
NVYDYEGVLGYTYDDLNFGGMDLGQLEEYIQRQRQRDRTFAGFFLSHIGTSANVEI I IDHGTLHTS 
VGTFAVLGGEKEMKWGFDRLYKYEITDELRQLNLRADDGFSISVKVTDVDGSELSSELIPSAAIIF 
ERSH 

DOMAIN C 

IDHQDPHQDTIIRKNVDNLTPEEINSLRRAMADLQSDKTAGGFQQIAAFHGEPK^CPSPDAEKKFS 
CCVHGMAVFPHWHRLLTVQGENALRKHGCLGALPYWDWTRPLSHLPDLVSQQNYTDAISTVEARNP 
WYSGHIDTVGVDTTRSVRQELYEAPGFGHYTGVAKQVLLALEQDDFCDFEVQFEIAHNFIHALVGG 
SEPYGMASLRYTTYDPIFYLHHSNTDRLWAIWQALQKYRGKPYNSANCAIASMRKPLQPFGLTDEI 
NPDDETRQHAVPFSVFDYKNNFNYEYDTLDFNGLSISQLDRELSRRKSHDRVFAGFLLHGIQQSAL 
VKFFVCKSDDDCDHYAGEFYILGDEAEMPWGYDRLYKYEITEQLNALDLHIGDRFFIRYEAFDLHG 
TSLGSNI FPKPSVIHDEGA 

DOMAIN D 

GHHQADEYDEVVTAASHIRKNLKDLSKGEVESLRSAFLQLQNDGVYENIAKFHGKPGLCDDNGRKV 
ACCVHGMPTFPQWHRLYVLQVENALLERGSAVSVPYWDWTETFTELPSLIAEATYFNSRQQTFDPN 
PFFRGKISFENAVTTRDPQPELYVNRYYYQNVMLAFEQ 

ISSLDYSAFDPVFFLHHANTDRLWAIWQELQRYRKKPYNEADCAINLMRKPLHPFDNSDLNHDPVT 
FKYSKPTDGFDYQNNFGYKYDNLEFNHFSIPRLEEIIRIRQRQDRVFAGFLLHNIGTSATVEIFVC 
VPTTSGEQNCENKAGTFAVLGGETEMAFHFDRLYRFDISETLRDLGIQLDSHDFDLSIKIQGVNGS 
YLDPHILPEPSLIFVPGSS 

DOMAIN E 

SFLRPDGHSDDILVRKEVNSLTTRETASLIHALKSMQEDHSPDGFQAIASFHALPPLCPSPSATHR 
YACCVHGMATFPQWHRLYTVQFQDALRRHGAAVGVPYWDWLRPQSHLPELVTMETYHDIWSNRDFP 
NPFYQANIEFEGENITTEREVIADKLFVKGGHVFDNWFFKQAILALEQENYCDFEIQFEILHNGVH 
TWVGGSRTHSIGHLHYASYDPLFYLHHSQTDRIWAIWQELQEQRGLSGDEAHCALEQMREPLKPFS 
FGAPYNLNQLTQDFSRPEDTFDYRKFGYEYDNLEFLGMSVAELDQYIIEHQENDRVFAGFLLSGFG 
GSASVNFQVCRADSTCQDAGYFTVLGGSAEMAWAFDRLYKYDITETLEKMHLRYDDDFTISVSLTA 
NNGTVLSSSLI PTPSVI FQRGH 
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DOMAIN F 

RDINTKSMSANRWRELSDLSARDPSSLKSALRDLQEDDGPNGYQALAAFHGLPAGCHDSQGNEIA 
CCIHGMPTFPQWHRLYTLQLEMALRRHGSSVAI PYWDWTKPI SELPSLFTSPEYYDPWHDAWNNP 
F S KGFVKFANTYTVRD PQEMLFQLCEHGE S I LYEQTLLALEQTDYCDFEVQFE VLHNVIHYLVGGR 
QTYALS SLHYAS YDPFFFI HHS FVDKMWWWQALQKRRKLPYKRADCAVNLMTKPMRPFDSDMNQN 
PFTKMHAVPNTLYDYETLYYSYDNLEIGGRNLDQLQAEIDRSRSHDRVFAGFLLRGIGTSADVRFW 
I CRNENDCHRGG 1 1 F I LGGAKEMPWS FDRNFKFD I THVLE KAG I S PEDVFDAEE P F Y I KVE I HAVN 
KTMIPSSVIPAPTIIYSPGE 

DOMAIN G 

GRAADS AHS AN I AGSGVRKDVTTLTVSETENLRQALQGVI DDTGPNGYQAI AS FHGS P PMCEMNGR 
KVACCAHGMAS FPHWHRLYVKQMEDALADHGSHI GI PYWDWTTAFTELPALVTDSENNPFHEGRI D 
HLGVTTSRS PRDMLFNDPEQGS E S FF YRQVLLALEQTDYCQFE VQFELTHNAI HS WTGGRS PYGMS 
TLEFTAYDPLFWLHHSNTDRIWAVWQALQKYRGLPYNEAHCEIQVLKQPLRPFNDDINHNPITKTN 
ARP I DS FD YERFNYQYDTLS FHGKS I PELNDLLEERKREERTFAAFLLRG I GCSADWFDI CRPNG 
DCVFAGTFAVLGGELEMPWSFDRLFRYDI TRVMNQLHLQYDSDFSFRVKLVATNGTELS SDLLKS P 

TIEHEL 
DOMAIN H 

GAHRGPVEETEVTHQNTDGNAHFHRKEVDSLSLDEANNLKNALYKLQNDHSLTGYEAISGYHGYPN 
LCPEEGDDKYPCCVHGMAIFPHWHRLLTIQLERALEHNGALLGVPYWDWTKDLSSLPAFFSDSSNN 
NPYFKYHIAGVGHDTVREPTSLIYNQPQIHGYDYLYYLALTTLEENNYCDFEVQYEILHNAVHSWL 
GGSQKYSMSTLEYSAFDPVFMILHSGLDRLWIIWQELQKIRRKPYNFAKCAYHMMEEPLAPFSYPS 
INQDEFTRANSKPSTVFDSHKFGYHYDNLNVRGHSIQELNTIINDLRNTDRIYAGFVLSGIGTSAS 
VKIYLRTDDNDEEVGTFTVLGGEREMPWAYERVFKYDITEVADRLKLSYGDTFNFRLEITSYDGSV 
VNKSLPNPFIIYRPANHDYDVLVIPVGRNLHIPPKVWKRGTRIEFHPVDDSVTRPWDLGSYTAL 
FNCWPPFTYRGFELNHVYSVKPGDYYVTGPTRDLCQNADVRIHIHVEDE 
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Genomic sequence of the KLH1 gene 

DOMAIN IB 

GGCCTACCGTACTGGGACTGGACTGAACCCATGACACACATTCCGGGTCTGGCAGGAAACAAAACT 
TATGTGGATTCTCATGGTGCATCCCACACAAATCCTTTTCATAGTTCAGTGATTGCATTTGAAGAA 
AATGCTCCCCACACCAAAAGACAAATAGATCAAAGACTCTTTAAACCCGCTACCTTTGGACACCAC 
ACAGACCTGTTCAACCAGATTTTGTATGCCTTTGAACAAGAAGATTACTGTGACTTTGAAGTCCAA 
TTTGAGATTACCCATAACACGATTCACGCTTGGACAGGAGGAAGCGAACATTTCTCAATGTCGTCC 
CTACATTACACAGCTTTCGATCCTTTGTTTTACTTTCACCATTCTAACGTTGATCGTCTTTGGGCC 
GTTTGGCAAGCCTTACAGATGAGACGGCATAAACCCTACAGGGCCCACTGCGCCATATCTCTGGAA 
CATATGCATCTGAAACCATTCGCCTTTTCATCTCCCCTTAACAATAACGAAAAGACTCATGCCAAT 
GCCATGCCAAACAAGATCTACGACTATGAAAATGTCCTCCATTACACATACGAAGATTTAACATTT 
GGAGGCATCTCTCTGGAAAACATAGAAAAGATGATCCACGAAAACCAGCAAGAAGACAGAATATAT 
GCCGGTTTTCTCCTGGCTGGCATACGTACTTCAGCAAATGTTGATATCTTCATTAAAACTACCGAT 
TCCGTGCAACATAAGGCTGGAACATTTGCAGTGCTCGGTGGAAGCAAGGAAATGAAGTGGGGATTT 
GATCGCGTTTTCAAGTTTGACATCACGCACGTTTTGAAAGATCTCGATCTCACTGCTGATGGCGAT 
TTCGAAGTTACTGTTGACATCACTGAAGTCGATGGAACTAAACTTGCATCCAGTCTTATTCCACAT 
GCTTCTGTCATTCGTGAGCATGCACGTGGTAAGCTGAATAGAG 

INTRON 1B/1C(SEQ ID NO: 13 9) 

GTTTTGTAATAATTATGTAGAATTCTTTACCTCAGAATAAGATGAGGTCACATGGGTTTTGCAAAA 
CTATTACGTTCGAATTAATATTAATAATACCGGACCCTCCACTGGTACATATTTATCTTTATAACG 
ATAATAGCGATGATGATGATGATGATGATGATGATGATGATGATGATAATGATGATGCCGGTATTG 
CACGTAATCCAGCCGACTTAGATGACACCCTAAGGGTGCAGAAAGTATAACAATTAGATTGCGTTT 
GCATCTGTGTATGCGTGTGCTTTAACCAAAAGTCAAAATAAAAGTGCAAACCCTTAGTTTATTCAT 
TTGATAGAGCCTTTTACGATAAGAACAATGTAATAAATTAGAACATAACTGAAACCTCCGAAAGAA 
GGCCTGTTTGTCAAGAGAGGTATCGACATGATTGACTTATAAACCTGTGCTTCTATATTTTGGAAC 
TGTCCACTTTCTTGTTGTGTGTACTGTAATCACATCGCACTATGGCTGCAAGACGTGTACGAGTAC 
ACTATATACTTACCTAATGACCAACCACAAGGCTGGCTTTGTTAATATTGTTATTTCACAGAAATA 
AACACAGAATTCCAGCATTTGGCTGGTGTATTTAGCAAAACACCGATATGACACTCATGTTTTATT 
ACATTTTTTTCAG 

DOMAIN 1C 

TTAAATTTGACAAAGTGCCAAGGAGTCGTCTTATTCGAAAAAATGTAGACCGTTTGAGCCCCGAGG 
AGATGAATGAACTTCGTAAAGCCCTAGCCTTACTGAAAGAGGACAAAAGTGCCGGTGGATTTCAGC 
AGCTTGGTGCATTCCATGGGGAGCCAAAATGGTGTCCTAGTCCCGAAGCATCTAAAAAATTTGCCT 
GCTGTGTTCACGGCATGTCTGTGTTCCCTCACTGGCATCGACTGTTGACGGTTCAGAGTGAAAATG 
CTTTGAGACGACATGGCTACGATGGAGCTTTGCCGTACTGGGATTGGACCTCTCCTCTTAATCACC 
TTCCCGAACTGGCAGATCATGAGAAGTACGTCGACCCTGAAGATGGGGTAGAGAAGCATAACCCTT 
GGTTCGATGGTCATATAGATACAGTCGACAAAACAACAACAAGAAGTGTTCAGAATAAACTCTTCG 
AACAGCCTGAGTTTGGTCATTATACAAGCATTGCCAAACAAGTACTGCTAGCGTTGGAACAGGACA 
ATTTCTGTGACTTTGAAATCCAATATGAGATTGCCCATAACTACATCCATGCACTTGTAGGAGGCG 
CTCAGCCTTATGGTATGGCATCGCTTCGCTACACTGCTTTTGATCCACTATTCTACTTGCATCACT 
CTAATACAGATCGTATATGGGCAATATGGCAGGCTTTACAGAAGTACAGAGGAAAACCGTACAACG 
TTGCTAACTGTGCTGTTACATCGATGAGAGAACCTTTGCAACCATTTGGCCTCTCTGCCAATATCA 
ACACAGACCATGTAACCAAGGAGCATTCAGTGCCATTCAACGTTTTTGATTACAAGACCAATTTCA 
ATTATGAATATGACACTTTGGAATTTAACGGTCTCTCAATCTCTCAGTTGAATAAAAAGCTCGAAG 
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CGATAAAGAGCCAAGACAGGTTCTTTGCAGGCTTCCTGTTATCTGGTTTCAAGAAATCATCTCTTG 
TTAAATTCAATATTTGCACCGATAGCAGCAACTGTCACCCCGCTGGAGAGTTTTACCTTCTGGGTG 
ATGAAAACGAGATGCCATGGGCATACGATAGAGTCTTCAAATATGACATAACCGAAAAACTCCACG 
ATCTAAAGCTGCATGCAGAAGACCACTTCTACATTGACTATGAAGTATTTGACCTTAAACCAGCAA 
GCCTGGGAAAAGATTTGTTCAAGCAGCCTTCAGTCATTCATGAACCAAGAATAG 

INTRON 1C/1D (SEQ ID NO: 14 0) 

GTACTTGTTATATGTTTCGAATATTGCCGATACCTTCAATATATATACTTTATCAAAGTAATTGAT 
TAATCTGAAGTAATTTTCCTTTCCAGTAGAGATTCAGTTGATACAACAAGAATTCGCCCTGTTGTA 
TGTCACTTTATTTTCATCAAACGATTCGAAGTGAGCTGTCCATGCCACAATGGGGTCTCTGTAACT 
TTCTCGTATGGGGTATAGATTATATAGACGTGGCAGACCTTACGTATAACTAATATTTGTGTAATG 
TCGTTTCAG 

DOMAIN ID 

GTCACCATGAAGGCGAAGTATATCAAGCTGAAGTAACTTCTGCCAACCGTATTCGAAAAAACATTG 
AAAATCTGAGCCTTGGTGAACTCGAAAGTCTGAGAGCTGCCTTCCTGGAAATTGAAAACGATGGAA 
CTTACGAATCAATAGCTAAATTCCATGGTAGCCCTGGTTTGTGCCAGTTAAATGGTAACCCCATCT 
CTTGTTGTGTCCATGGCATGCCAACTTTCCCTCACTGGCACAGACTGTACGTGGTTGTCGTTGAGA 
ATGCCCTCCTGAAAAAAGGATCATCTGTAGCTGTTCCCTATTGGGACTGGACAAAACGAATCGAAC 
ATTTACCTCACCTGATTTCAGACGCCACTTACTACAATTCCAGGCAACATCACTATGAGACAAACC 
CATTCCATCATGGCAAAATCACACACGAGAATGAAATCACTACTAGGGATCCCAAGGACAGCCTCT 
TCCATTCAGACTACTTTTACGAGCAGGTCCTTTACGCCTTGGAGCAGGATAACTTCTGTGATTTCG 
AGATTCAGTTGGAGATATTACACAATGCATTGCATTCTTTACTTGGTGGCAAAGGTAAATATTCCA 
TGTCAAACCTTGATTACGCTGCTTTTGATCCTGTGTTCTTCCTTCATCACGCAACGACTGACAGAA 
TCTGGGCAATCTGGCAAGACCTTCAGAGGTTCCGAAAACGGCCATACCGAGAAGCGAATTGCGCTA 
TCCAATTGATGCACACGCCACTCCAGCCGTTTGATAAGAGCGACAACAATGACGAGGCAACGAAAA 
CGCATGCCACTCCACATGATGGTTTTGAATATCAAAACAGCTTTGGTTATGCTTACGATAATCTGG 
AACTGAATCACTACTCGATTCCTCAGCTTGATCACATGCTGCAAGAAAGAAAAAGGCATGACAGAG 
TATTCGCTGGCTTCCTCCTTCACAATATTGGAACATCTGCCGATGGCCATGTATTTGTATGTCTCC 
CAACTGGGGAACACACGAAGGACTGCAGTCATGAGGCTGGTATGTTCTCCATCTTAGGCGGTCAAA 
CGGAGATGTCCTTTGTATTTGACAGACTTTACAAACTTGACATAACTAAAGCCTTGAAAAAGAACG 
GTGTGCACCTGCAAGGGGATTTCGATCTGGAAATTGAGATTACGGCTGTGAATGGATCTCATCTAG 
ACAGTCATGTCATCCACTCTCCCACTATACTGTTTGAGGCCGGAACAG 

INTRON ID/ IE (SEQ ID NO: 141) 

GTAACTATTTTGTCACTGTAACCAACAACTGCAGTCTATTTTGCAATTACGATAATAACAATTTTT 
GAAATATATCTTTATTAAAGCAAAGGTTTCTAGAGACAAACAGCCGGCTCTAATTATTTTTTCGAA 
CTTACGCTTGAGTAAAGATCTGCAAATGGCAACCCTACCTATACTATTAAAAATATAATGTTACAT 
TCGTATCTGAATGTTTAATAAATCACTTCATATTCTGTTGCAG 

DOMAIN IE 

ATTCTGCCCACACAGATGATGGACACACTGAACCAGTGATGATTCGCAAAGATATCACACAATTGG 
ACAAGCGTCAACAACTGTCACTGGTGAAAGCCCTCGAGTCCATGAAAGCCGACCATTCATCTGATG 
GGTTCCAGGCAATCGCTTCCTTCCATGCTCTTCCTCCTCTTTGTCCATCACCAGCTGCTTCAAAGA 
GGTTTGCGTGCTGCGTCCATGGCATGGCAACGTTCCCACAATGGCACCGTCTGTACACAGTCCAAT 
TCCAAGATTCTCTCAGAAAACATGGTGCAGTCGTTGGACTTCCGTACTGGGACTGGACCCTACCTC 
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GTTCTGAATTACCAGAGCTCCTGACCGTCTCAACTATTCATGACCCGGAGACAGGCAGAGATATAC 
CAAATCCATTTATTGGTTCTAAAATAGAGTTTGAAGGAGAAAACGTACATACTAAAAGAGATATCA 
ATAGGGATCGTCTCTTCCAGGGATCAACAAAAACACATCATAACTGGTTTATTGAGCAAGCACTGC 
TTGCTCTTGAACAAACCAACTACTGCGACTTCGAGGTTCAGTTTGAAATTATGCATAATGGTGTTC 
ATACCTGGGTTGGAGGCAAGGAGCCCTATGGAATTGGCCATCTGCATTATGCTTCCTATGATCCAC 
TTTTCTACATCCATCACTCCCAAACTGATCGTATTTGGGCTATATGGCAATCGTTGCAGCGTTTCA 
GAGGACTTTCTGGATCTGAGGCTAACTGTGCTGTAAATCTCATGAAAACTCCTCTGAAGCCTTTCA 
GCTTTGGAGCACCATATAATCTTAATGATCACACGCATGATTTCTCAAAGCCTGAAGATACATTCG 
ACTACCAAAAGTTTGGATACATATATGACACTCTGGAATTTGCAGGGTGGTCAATTCGTGGCATTG 
ACCATATTGTCCGTAACAGGCAGGAACATTCAAGGGTCTTTGCCGGATTCTTGCTTGAAGGATTTG 
GCACCTCTGCCACTGTCGATTTCCAGGTCTGTCGCACAGCGGGAGACTGTGAAGATGCAGGGTACT 
TCACCGTGTTGGGAGGTGAAAAAGAAATGCCTTGGGCCTTTGATCGGCTTTACAAGTACGACATAA 
CAGAAACCTTAGACAAGATGAACCTTCGACATGACGAAATCTTCCAGATTGAAGTAACCATTACAT 
CCTACGATGGAACTGTACTCGATAGTGGCCTTATTCCCACACCGTCAATCATCTATGATCCTGCTC 
ATC 

INTRON IE/IF (SEQ ID NO: 142) 

GTAAGTATACACACATTATTTCTCTTCTGCTATATCAGATGAAGAGAACGTTGTATCACTAACCTA 
GTCTTGTTTGATTTGTGGTTTCGTTTGCTTCCTGAACAGTAGGGTTGATTTAACTTCTCTGTTTCG 
TCTGTACCAATGAAAGACTATGATGCTTGTGTGAAGATGCTTTGTTCATGAGTCAGTCTGTTCTTG 
TAATGCTTTGATCTTTGCCATCAACATTCTTGAAATTAATTATGGTTTCCCTTAAATACTTACATA 
TTACATTTAAACGTCGCTGCTTGTCTGATTGCATATTCTTTCAAAAATAACTATATATTCCAG 

DOMAIN 1F-1 (1st part of domain f) 

ATGATATTAGTTCGCACCACCTGTCGCTCAACAAGGTTCGTCATGATCTGAGTACACTGAGTGAGC 
GAGATATTGGAAGCCTTAAATATGCTTTGAGCAGCTTGCAGGCAGATACCTCAGCAGATGGTTTTG 
CTGCCATTGCATCCTTCCATGGTCTGCCTGCCAAATGTAATGACAGCCACAATAACGAG 

INTRON 1F-1/1F-2 (SEQ ID NO: 143) 

GTAAATATACAGTGAAATCCGGATAAGTAAAATCCAGATAAGAAAAAAAACATTTTCTGTGGTCCC 
GGCATGTTTCTTCTTCATCTATCATTATTTTGATACGGATAAGTAAAAATCGGCTGAGTAAAACAT 
CCGGGTAAGTAAAATGATTTTCGAGGTCTCTTCATCGGATAAGTAAGATACACAAGTGATCATTCC 
AATAAACACTAACTGATGCAACACAATACCAGCGCACAGTGTTTTCACTACGTTTGTTTGTATTGT 
AATTAACAATTAACACTTAAGTGTTTCCCAATGTGTCCGTGTGCAAACTGATTGGGACAAAGCTTG 
CAACAAGCCCGGCAATTCCATGTCGTTTATGTCTACGTTTGTTATTCTGACTGCTTGGAGGGGTTC 
GGAAAAAAATAAAAAACGGGTAAATATTATAAAAAATTCACGGTGCCTTGAAATTTTAGGTGTCCG 
GATTTCACTGTAGATGATTAATTTCTCACTTGTAAACAAAAGGACCCCAGTACCCTCATTCGTGAC 
GTACGTTATAAAATGTAATTATAAAAAGCCCATTATCATGTTATACGTGATCTTGNCTTGCAATTA 
TNCTACCGCTTTCTTGATTTTTTAAAGCAATTTCTCCCTCTATGAACTTATTAACATAGCACTCCT 
GCAAAAGAAAACAGTCACTGCATGGATCCATATTGAATGTTGCTGCTTATTTCTCATTTTATTACT 
CACAGATATTTCAAGAACATCGTACTCTCTAACCAGGCTAAAGCAAAGAGGGTTACATTTTAGCCG 
ACAAGTTCACTAGCTGAGTGGAACACGTATATATTAATGGAGATGACTCTGGTCATGATGATTAGG 
ACAATTATCATGACGTTATCATTGATCATGACCATGTCAGTATAATAGATAGCTAACAAATAATGT 
AATTACTAATTATGAAGCAATGGTGCATTTGCAG 
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GTGGCATGCTGTATCCATGGAATGCCTACATTCCCCCACTGGCACAGACTCTACACCCTCCAATTT 
GAGCAAGCTCTAAGAAGACATGGCTCTAGTGTAGCAGTACCCTACTGGGACTGGACAAAGCCAATA 
CATAATATTCCACATCTGTTCACAGACAAAGAATACTACGATGTCTGGAGAAATAAAGTAATGCCA 
AATCCATTTGCCCGAGGGTATGTCCCCTCACACGATACATACACGGTAAGAGACGTCCAAGAAGGC 
CTGTTCCACCTGACATCAACGGGTGAACACTCAGCGCTTCTGAATCAAGCTCTTTTGGCGCTGGAA 
CAGCACGACTACTGCGATTTTGCAGTCCAGTTTGAAGTCATGCACAACACAATCCATTACCTAGTG 
GGAGGACCTCAAGTCTATTCTTTGTCATCCCTTCATTATGCTTCATATGATCCGATCTTCTTCATA 
CACCACTCCTTTGTAGACAAGGTTTGGGCTGTCTGGCAGGCTCTTCAAGAAAAGAGAGGCCTTCCA 
TCAGACCGTGCTGACTGCGCTGTTAGTCTGATGACTCAGAACATGAGGCCTTTCCATTACGAAATT 
AACCATAACCAGTTCACCAAGAAACATGCAGTTCCAAATGATGTTTTCAAGTACGAACTCCTGGGT 
TACAGATACGACAATCTGGAAATCGGTGGCATGAATTTGCATGAAATTGAAAAGGAAATCAAAGAC 
AAACAGCACCATGTGAGAGTGTTTGCAGGGTTCCTCCTTCACGGAATTAGAACCTCAGCTGATGTC 
CAATTCCAGATTTGTAAAACATCAGAAGATTGTCACCATGGAGGCCAAATCTTCGTTCTTGGGGGG 
ACTAAAGAGATGGCCTGGGCTTATAACCGTTTATTCAAGTACGATATTACCCATGCTCTTCATGAC 
GCACACATCACTCCAGAAGACGTATTCCATCCCTCTGAACCATTCTTCATCAAGGTGTCAGTGACA 
GCCGTCAACGGAACAGTTCTTCCGGCTTCAATCCTGCATGCACCAACCATTATCTATGAACCTGGT 

CTCGGTG 

INTRON 1F-2/1G-1 (SEQ ID NO: 144) 

GTCTCGGTGAGTTATTAAAAGAAACAAAATATTTACCATTACCATTGTTAACTACAAAAATGAGTG 
AGATATCTTATATCACTGGTACACTACTGATATTTTATGCAATGAAATTACTATTTTTCCAGGTAC 
GCTTCAACCCCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCATCATGCTTTTCTGT 
AAAACATAAAACACCAATTAACAATGTTCTTAGTGTGTTTGTTGACTCCCTTCCACTGCAACGCCT 
ACATAATCAAAGTGTTCGTTTTTTTCCAAACTTTCCAGTTAGTGTTGAAGACTAAAAAGTTAAATA 
AGCATTCACATAACTTCTAAGAGCAACTGGGACCATGCAGTTACGTATTGATATTTCTGTGAGAGT 
GAAGCAAAACACTGTTTTTCAAGCTTAGGTTTATCAATCAAAATGTCCAATAGTTCATGTTATCGA 
AAAGGCAGCGAAGGATAAGAGGCTCCGAGACATCTTGTCTATTCTCGTGTTCATATGATATCAACT 
GAGGAGCTTCCATTACATTTTTGACCTTATCATTTAAAGACATACATGGAACATTTTCATTTTACA 
GTTAAAGTGAACCACTTCAGGTTCAACTTCAACTTCGAATTCAACTTCTGTTGTGTGTTTTATGAG 
CCGACTGAAATAGAGTGCCTTACTTTCACTTCTAGTTTCGTTCTGTCTCGTCATCGTTGTTTCTTT 
CAGTGTGCATAGTACACGCCTAGTATAGAACACACGAACTTGTCCTTACTTAATAGATTCTGAAAC 
TATTATGTGGAAAGTTGGCAGGCTATAGTAACATCCTGGCAAAATTATCATGTATCCTCTTGTTTG 
TCATAATTAG 

DOMAIN 1G-1 (1st part of domain g) 

ACCATCACGAAGATCATCATTCTTCTTCTATGGCTGGACATGGTGTCAGAAAGGAAATCAACACAC 
TTACCACTGCAGAGGTGGACAATCTCAAAGATGCCATGAGAGCCGTCATGGCAGACCACGGTCCAA 
ATGGATACCAGGCTATAGCAGCGTTCCATGGAAACCCACCAATGTGCCCTATGCCAGATGGAAAGA 
ATTACTCGTGTTGTACACATG 

INTRON G1-1/1G-2 (SEQ ID NO: 145) 

GTATGTATTTCCCACTGGTGGTCGCTGACTGCCAACACATACTTGTAATTTATTCATGAAAGTATA 
ATAGTTTGTTTGAAAGTATATTTATAACCATCTTGCACAAGCGTCACGAATTTTCACCACAAAGCT 
TCAAAACGCCCAAAACATTCTAATAGCGATATATTTGTTAAAAGACCAAAATATAGCCTTACAACA 
ATAGATTATTTTAATAAGACCAGTCAGTGCATGCAAATCGATTGGAAACTTTGAAATAAAATATTC 
TATGTACTAACTGCCAATCTCATAATACTTGCCTTGGATGTGCTTCTTTTTCACATTCGCGTCGAG 
CTTCAACTCCAATGCATAAGCTTAAAAATAATCATAAACACAAACAAATAGCCACAGAGGCGACGA 
TCCCTCCAGGCCAGGCTTTATTTGTCTCTTATAGAATATATCGCTATTAGAATGTTTTTGACGTTT 
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TGAAGCTTTGTGGGTGAAAATTCGTGATGTTTATGCGTGGTATTTATGTAAGATGAAAATAAATAT 
ATCTTTTCAAACAAGATTTTAGTATTTTGAAGACTTCTATGAATAAATTACACTTATGTGTTAGGT 
TATTGGTCACTGAGCGCTTGTGGTATTTTCCCTTCTTCAATTTGTTTGTTCTTTGTTCAATTTCGA 
ATAGTTATCCTACTGTGGATAGTCTATATGAGAATCGTTGAAAGAATAATACAATTCTAATGGATT 
GCAACTTCTTTAACTTTTATTTGCAACTGCCACGTTTCGGTATACGTTCTTATGCCGTCATCAAGC 
ATACGAGTGTACATGTATGCCAAAACGCTGCAAATAAAAATTAAAGAAGTTGCAATCCATAAGAAT 
TTCAATGTTCTTTCATCATCACATCAACTTCTAAAAATGCCTATAAAACAATCAACAAACGTACAA 
TAGTACATTACCGGATCTCGCAGCATGACCACGTCGATATCTAAACAATATCACTATCCATTAATA 
GGATCAAGAGTAGGTACAGACATGTTCAGTTATAAATACTCTTCAAAAAAGTAGGGGAACTTGGAA 
TTTCAAGGTCAATAACAAACTAATGATAATAACAATTGGTCCCAAATAATAACAATTGGTCCCAAA 
CTAATTGTATCTTTACAAAGAAGAAATTGAGTGAACAATTCACCCGGTATTTTATTACCTAAACCG 
TTTCTCTTGCTGTTATGGTGCGTGAAAGAAGAAATGGGTAAGAAACGGAAATTGACATTTTTGCGT 
CAGTGGTGCGTAATGCCCCCATTGTTGGCCAAACACTGATTGATTCGCTGAGGCATCGTGCATACG 
CGTCTACCTATGGTAATTTGATGCAGTCTGTCCCATTCTTCCACCAACGCCTGGACAAGTTCATCT 
AGCGTGGCTGGTGGCCTTTCACGTTGACGCACACGTCGGCCCAAGATGTCCCAGACATTTTCAATG 
GCCAGGGCTCATTGCTGGTCAGGGCATCCTATGGATATTGTGCCGTTGAAGGTGGTTATGTTGTTC 
ACATTGAAATTGCAAGTTCTCCTACTCTTTTTAAGAGGAGGTTCACAAAGTACGTTCTTTCATGTT 
GGTGAAGAGAATATCAAGGTCTTCTAAGGGATTGTGTCTTATAATATTTGATTTTAAGAAGTTTGA 
TATTATCTGCATCCTTCCCAAGAAATTGCAAATGTTCACACACTATTGCGTTTGATAATGTTTTTG 
GGGAAATAAACTGTCCAGGACTGCTAAATAGTAATTATTGCTACTTTTAG 

DOMAIN 1G-2 (2nd part of domain g) 

GCATGGCTACTTTCCCCCACTGGCACAGACTGTACACAAAACAGATGGAAGATGCCTTGACCGCCC 
ATGGTGCCAGAGTCGGCCTTCCTTACTGGGACGGGACAACTGCCTTTACAGCTTTGCCAACTTTTG 
TCACAGATGAAGAGGACAATCCTTTCCATCAT 

INTRON 1G-2/1G-3 (SEQ ID NO: 146) 

GTGAGTTCACGTAAGCCTACGAGATCAACATTACTCCTTAACAGCCACGGCATCATGTACCGATAT 
ATCACAAACAAAAGTATTCAAAGCTTTAAACACGATATGTATGGTTCAAGAATGACATCATTAAAC 
AAGGACATGAGTCTGAAATAAACATGACTTGACACCGTTGTGGTCACAGTTTTGTTTCTCATTGGT 
GAACCTGTGAAACAACCTTTCAAA CCAAAAGA TGCCTA TTAA TA TTGTTAA TTCCCA TGAA TTAGG 

AGATACACACATTCTACTGTCATTT AATAACCGCTTC 

CAGCATGAAAACACAATATGATTATCTCAATTCTACCATTACTAATTATAATTTTGACTGGCATTA 
TTTGACGACGCGTAAAACATCGCTGCTTTACAGACTGCACTGCGGTAACTGTGACGTTTTCATGAC 
GTCACTACATTCTATTCAAAACATTTCCACAGAAGAGCGAGACCACGGCCGTGATGGGTTCTGGGC 
AGATGATTACCCAAGTATATATTTATAATAACTTGACTGCTTGCCTGAATAATGTTGACACATGAC 
AACGAATTTGTGATAGCGTAAGAAGCGTGAATACTGTGAATAGTGTGAGGGGTGTTTGCTGAGAGT 
TAA CCA CCGTTAA TTGCAAAA TTCCCGAA TA CTTGCA TTTGCA GTCGAAGAAGAA TTGCA TTCTTA 
CTCCTGTGAATGGACTCATTGTTATTTAGCAGCGGTTATTGAGGTTTTGATCACCTCTAAATAGAC 
AA TCAGGA TGCGGCAAA CCGGAAAA TTA TAGCAGAA TCTGTAA TTCAAGA TGGGCTTGCCTGTGAA 
AATATGCTGCGAGTTCAGTAACACTTTTCCCTTTCGATCATGGCCTGTTTTGCTCTGAATCTGGTC 
TTTCAGAGGA TCCCTGCTTTTTTAAAA CTAAAGTCCTCCCAACTCA CTTA TA TTTA TGTTTTTTAA 
TTA TTTA TAGTTTTAA TA TGAACAA CAAA TCA TA TTTA TTTACACA TTA TA TTTTTCAG 

DOMAIN 1G-3 (3rd part of domain g) 

GGTCACATAGACTATTTGGGAGTGGATACAACTCGGTCGCCCCGAGACAAGTTGTTCAATGATCCA 
GAGCGAGGATCAGAATCGTTCTTCTACAGGCAGGTTCTCTTGGCTTTGGAGCAGACAGAT 
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Primary structure of the KLH1 protein 

DOMAIN B 

GLPYWDWTEPMTHIPGLAGNKTYVDSHGASHTNPFHSSVIAFEENAPHTKRQIDQRLFKPATFGHH 
TDLFNQILYAFEQEDYCDFEVQFEITHNTIHAWTGGSEHFSMSSLHYTAFDPLFYFHHSNVDRLWA 
VWQALQMRRHKPYRAHCAISLEHMHLKPF 
GGISLENIEKMIHENQQEDRIYAGFLLAGIRTSA^ 

DRVFKFDITHVLKDLDLTADGDFEVTVDITEVDGTKLASSLIPHASVIREHARGKLNR 
DOMAIN C 

VKFDKVPRSRLIRKNVDRLSPEEMNELRKALALLKEDKSAGGFQQLGAFHGEPKWCPSPEA 

CCVHGMSVFPHWHRLLTVQSENALRRHGYDGALPYWDWTSPLNHLPELADHEKYVDPEDGVEKHNP 

WFDGHIDTVDKTTTRSVQNKLFEQPEFGHYTSIAKQVLLALEQDNFCDFEIQYEIAHNYIHALVGG 

AQPYGMASLRYTAFDPLFYLHHSNTDRIWAIWQALQKYRGKPYNVANCAVTSMREPLQPFGLSANI 

NTDHVTKEHSVPFNVFDYKTNFNYEYDTLEFNGLSISQLNKKLEAIKSQDRFFAGFLLSGFKKSSL 

VKFNICTDSSNCHPAGEFYLLGDENEMPWAYDRVFKYDITEKLHDLKLHAEDHFYIDYEVFDLKPA 

SLGKDLFKQPSVIHEPRI 

DOMAIN D 

GHHEGEVYQAEVTSANRIRKNIENLSLGELESLRAAFLEIENDGTYESIAKFHGSPGLCQLNGNPI 
SCCVHGMPTFPHWHRLYVWVENALLKKGSSVAVPYWDWTKRIEHLPHLISDATYYNSRQHHYETN 
PFHHGKI THENE I TTRDPKDSLFHSDYFYEQVLYALEQDNFCDFE I QLE I LHNALH S LLGGKGKY S 
MSNLDYAAFDPVFFLHHATTDRIWAIWQDLQRFRKRPYREANCAIQLMHTPLQPFDKSDNNDEATK 
THATPHDGFEYQNSFGYAYDNLELNHYSIPQLDHMLQERKRHDRVFAGFLLHNIGTSADGHVFVCL 
PTGEHTKDC SHEAGMFS I LGGQTEMS FVFDRL YKLD I TKALKKNGVHLQGDFDLE I E I TAVNGSHL 
DSHVIHSPTILFEAG 

DOMAIN E 

TDSAHTDDGHTEPVMIRKDITQLDKRQQLSLVKALESMKADHSSDGFQAIASFHALPPLCPSPAAS 
K^FACCVHGMATFPQWHRLYTVQFQDSLRKHGAVVGLPYWDWTLPRSELPELLTVSTIHDPETGRD 
IPNPFIGSKIEFEGENVHTKRDINRDRLFQGSTKTHHNWFIEQALLiALEQTNYCDFEVQFEIMHNG 
VHTWVGGKEPYGIGHLHYASYDPLFYIHHSQTDRIWAIWQSLQRFRGLSGSEANCAVNLMKTPLKP 
FSFGAPYNLNDHTHDFSKPEDTFDYQKFGYIYDTLEFAGWSIRGIDHIVRNRQEHSRVFAGFLLEG 
FGTSATVDFQVCRTAGDCEDAGYFTVLGGEKEMPWAFDRLYKYDITETLDKMNLiRHDEIFQIEVTI 
TSYDGTVLDSGLIPTPSI IYDPAH 

DOMAIN F 

HDISSHHLSLNKVRHDLSTLSERDIGSLKYALSSLQADTSADGFAAIASFHGLPAKCNDSHNNEVA 
CCIHGMPTFPHWHRLYTLQFEQALRRHGSSVAVPYWDWTKPIHNIPHLFTDKEYYDWRNKVMPNP 
FARGYVPSHDTYTVRDVQEGLFHLTSTGEHSALLNQALLALEQHDYCDFAVQFEVMHNTIHYLVGG 
PQVYSLSSLHYASYDPIFFIHHSFVDKVWAWQALQEKRGLPSDRADCAVSLMTQNMRPFHYEINH 
NQFTKKHAVPNDVFKYELLGYRYDNLEIGGMNLHEIEKEIKDKQHHVRVFAGFLLHGIRTSADVQF 
QICKTSEDCHHGGQIFVLGGTKEMAWAYNRLFKYDITHALHDAHITPEDVFHPSEPFFIKVSVTAV 
NGTVL PAS I LHAPT 1 1 YEPGLG 
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DOMAIN G 

DHHEDHHS S SMAGHGVRKE INTLTTAEVDNLKDAMRAVMADHGPNGYQAI AAFHGNPPMCPMPDGK 
NYSCCTHGMATFPHWHRLYTKQMEDALTAHGARVGLPYWDGTTAFTALPTFVTDEEDNPFHHGHID 
YLGVDTTRSPRDKLFNDPERGSESFFYRQVLLALEQTD 
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Genomic sequence of the KLH2 gene 



DOMAIN 2B 

GGCCTGCCCTACTGGGATTGGACCATGCCAATGAGTCATTTGCCAGAACTGGCTACAAGTGAGACC 
TACCTCGATCCAGTTACTGGGGAAACTAAAAACAACCCTTTCCATCACGCCCAAGTGGCGTTTGAA 
AATGGTGTAACAAGCAGGAATCCTGATGCCAAACTTTTTATGAAACCAACTTACGGAGACCACACT 
TACCTCTTCGACAGCATGATCTACGCATTTGAGCAGGAAGACTTCTGCGACTTTGAAGTCCAATAT 
GAGCTCACGCATAATGCAATACATGCATGGGTTGGAGGCAGTGAAAAGTATTCAATGTCTTCTCTT 
CACTACACTGCTTTTGATCCTATATTTTACCTCCATCACTCAAATGTTGATCGTCTCTGGGCCATT 
TGGCAAGCTCTTCAAATCAGGAGAGGCAAGTCTTACAAGGCCCACTGCGCCTCGTCTCAAGAAAGA 
GAACCATTAAAGCCTTTTGCATTCAGTTCCCCACTGAACAACAACGAGAAAACGTACCACAACTCT 
GTCCCCACTAACGTTTATGACTATGTGGGAGTTTTGCACTATCGATATGATGACCTTCAGTTTGGC 
GGTATGACCATGTCAGAACTTGAGGAATATATTCACAAGCAGACACAACATGATAGAACCTTTGCA 
GGATTCTTCCTTTCATATATTGGAACATCAGCAAGCGTAGATATCTTCATCAATCGAGAAGGTCAT 
GATAAATACAAAGTGGGAAGTTTTGTAGTACTTGGTGGATCCAAAGAAATGAAATGGGGCTTTGAT 
AGAATGTACAAGTATGAGATCACTGAGGCTCTGAAGACGCTGAATGTTGCAGTGGATGATGGGTTC 
AGCATTACTGTTGAGATCACCGATGTTGATGGATCTCCCCCATCTGCAGATCTCATTCCACCTCCT 
GCTATAATCTTTGACGTGGTCAGAG 

INTRON 2B/2C (SEQ ID NO: 147) 

GTATTTAAAAAAGTAATAAAACCATATTTTCGAATGCGCTTTATGAAATATCGTGTGACTGGTTCT 
TTAGTTTACATGGAGTGTAACAACATGCTCCATCAGTTGACATATACTGCTCACACAAAGTAAGGG 
ATATTTGATAATGATAACAAATATAATCAAAGCGGTTATACTATCAAGACTTATTCACATAATTAC 
AGGTGAAGGGAGGTGTGATCGTGTTCACTGATCAGGTTGAGGCCAGAGAAGTCCCAGTTTGAGTCT 
TGCAGAAGATGATGTTTAGGCATGGGGTCGAATCACCAAAATCACATGACTTCAATAACGGGTTGG 
ACCACCTCGAGCGACGATGCAAGCAGTAGAGCGTCTACGCATGCTCCTGATAAGGCGACCAATCTG 
TTCCTGGGGAATCAGTCGCCACTCCTCTTGTAGTGCCACGCTCATTTCTGCTACGGTCCTGGGTAC 
CTGCTATCGGGTCTTGATCCGTATCCCAAGGATGTCCCACACATGTTCAAGGTGAGAGGTCGGGGA 
ACATCGCTGGCCACGGTAAGGTCTGAATTTGATGCCGTTGAAAGTGAGCTCTGACAACCTGAGCAT 
GGTGAGCTCTGACGTTGTCGTCCTGAAAGATGAATCCAGCTCCATGACAGCGAGCAAAGGGCAGGA 
CGTGTTGGTCAATGCAGTTGTCTCTGCAGTACACACCTGTCACTCGCCACTCACAAGCGTGTAGAT 
CTGTACGACCAGTCATGGAGATCCCAGCCCACATCATAACGGACCCCTATCCATACCGATCATGAG 
CCACCATAGCAGCGTCTTGATGACGTTCTCCCTGTCGCCTCGACATCCTCACACGGCCAAAAGGAA 
CGTGGACTCGTCACTGAACATGACATTAGCCAACCTGGCACTTGTCCACCGCTGATGTTGGCGAGA 
CCATTCCAGTCGAGCTCTTCGGTGTCTGGCTTTCATCGATAACACGACGTAAGGTCTGCGGGCGTG 
CAAGACGGCTCTATGCAGGCGATTTCGGATTGTCTGGGTGCTAACTCTGATCCCAGGTGCCTGCTG 
AAGTTGATGCTGGATCTGTGTGGCATTGAGATGGCGATTCCTTAGGACTGTGGAGATGATGAATCG 
ATCTTGACTTATGGTGGTGACATTAGGACGTCGGGTTCGTGTCCTATCCTGCACTCTTCCAGTTGT 
TCGGTGACGCTCTGGTACCCGGCTGATTACTGACTGAGAATATCCATCTGCCGTGCGACATGAGCC 
TGTGTTGGCCCAGCCTGAAGCATTGCAATCGCCAGAGACGCTCTTCAAAAGTCATTCGACGCATGG 
TTTTCTGTTCACAAATGACAGCGTAAAACAGTTTTTGGTGCTTTTATGCTTCCCAAGAGCATGAAA 
AACACGTTCTATGGGTCGTGCACACCTTACATGACAAGTGTGAAAAGTGACTTGCACCCCCTTGTG 
TGTTCGGATGCACACTCTGTTTACGTACTGATGCGATTTGGCGTCTAAACATGTTTTGGCGTCTAA 
ACATGTTTTCCTGCATGATTCATATACTATTTTGTCATATTCCTGGCATCAAACCAAACTACAGTG 
AAATATATTTCAATATCCCCTACTTTGTGTGAGTAGTATAGATCACTGCAGACAACATATAGACAA 
TGCAGTTACACCGTCAACAATCCCAGTCATTAATTATGATGACACTTCCACACATAGTGTCAGTGA 
TTGTAATTCAACTGTACACACTTTTCCCGTGAACATTCAGGATCTATATGACTAAATATATAACAT 
TAGTATACGTGCAGTTTTGTATCGCTACGACATTGTTGTAACTCTTTGTTTAATCATTTAACAG 
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DOMAIN 2C 

CTGATGCCAAAGACTTTGGCCATAGCAGAAAAATCAGGAAAGCCGTTGATTCTCTGACAGTCGAAG 
AACAAACTTCGTTGAGGCGAGCTATGGCAGATCTACAGGACGACAAAACATCAGGGGGTTTCCAGC 
AGATTGCAGCATTCCACGGAGAACCAAAATGGTGTCCAAGCCCCGAAGCGGAGAAAAAATTTGCAT 
GCTGTGTTCATGGAATGGCTGTTTTCCCTCACTGGCACAGATTGCTGACAGTTCAAGGAGAAAATG 
CTCTGAGGAAACATGGATTTACTGGTGGATTGCCCTATTGGGACTGGACTCGGCCAATGAGCGCCC 
TTCCACATTTTGTTGCTGATCCTACTTACAATGATTCTGTTTCCAGCCTCGAAGAAGATAACCCAT 
GGTATCATGGTCACATAGATTCTGTTGGGCATGATACTACAAGAGCTGTGCGTGATGATCTTTATC 
AATCTCCTGGTTTCGGTCACTACACAGATATTGCAAAACAAGTCCTTCTGGCCTTTGAGCAGGACG 
ATTTCTGTGATTTTGAGGTACAATTTGAAATTGCCCATAATTTCATACATGCTCTGGTTGGTGGTA 
ACGAACCATACAGTATGTCATCTTTGAGGTATACTACATACGATCCAATCTTCTTCTTGCACCGCT 
CCAATACAGACCGACTTTGGGCCATTTGGCAAGCTTTGCAAAAATACCGGGGGAAACCATACAACA 
CTGCAAACTGTGCCATTGCATCCATGAGAAAACCACTTCAGCCATTTGGTCTTGATAGTGTCATAA 
ATCCAGATGACGAAACTCGTGAACATTCGGTTCCTTTCCGAGTCTTCGACTACAAGAACAACTTCG 
ACTATGAGTATGAGAGCCTGGCATTTAATGGTCTGTCTATTGCCCAACTGGACCGAGAGTTGCAGA 
GAAGAAAGTCACATGACAGAGTCTTTGCAGGATTCCTTCTTCATGAAATTGGACAGTCTGCACTCG 
TGAAATTCTACGTTTGCAAACACAATGTATCTGACTGTGACCATTATGCTGGAGAATTCTACATTT 
TGGGAGATGAAGCTGAGATGCCTTGGAGGTATGACCGTGTGTACAAGTACGAGATAACACAGCAGC 
TGCACGATTTAGATCTACATGTTGGAGATAATTTCTTCCTTAAATATGAAGCCTTTGATCTGAATG 
GCGGAAGTCTTGGTGGAAGTATCTTTTCTCAGCCTTCGGTGATTTTCGAGCCAGCTGCAG 

INTRON 2C/2D (SEQ ID NO: 148) 

GTATGTTTTAAATGTCACTTATCCGTGATCTGTAATGAAGTTAGCAATTCACTTTATCAACTGTTT 
GGCTGTACTGTTTCAGTGCGAGTTTTACTTAGGTTGGATTAATTAAAATATTCAAGCTCATAAATG 
TTTTGATTCAACTTTTGTTATTTATTTCAAACAG 

DOMAIN 2D 

GTTCACACCAGGCTGATGAATATCGTGAGGCAGTAACAAGCGCTAGCCACATAAGAAAAAATATCC 
GGGACCTCTCAGAGGGAGAAATTGAGAGCATCAGATCTGCTTTCCTCCAAATTCAAAAAGAGGGTA 
TATATGAAAACATTGCAAAGTTCCATGGAAAACCAGGACTTTGTGAACATGATGGACATCCTGTTG 
CTTGTTGTGTCCATGGCATGCCCACCTTTCCCCACTGGCACAGACTGTACGTTCTTCAGGTGGAGA 
ATGCGCTCTTAGAACGAGGGTCTGCAGTTGCTGTTCCTTACTGGGACTGGACCGAGAAAGCTGACT 
CTCTGCCATCATTAATCAATGATGCAACTTATTTCAATTCACGATCCCAGACCTTTGATCCTAATC 
CTTTCTTCAGGGGACATATTGCCTTCGAGAATGCTGTGACGTCCAGAGATCCTCAGCCAGAACTAT 
GGGACAATAAGGACTTCTACGAGAATGTCATGCTGGCTCTTGAGCAAGACAACTTCTGTGACTTTG 
AGATTCAGCTTGAGCTGATACACAACGCCCTTCATTCTAGACTTGGAGGAAGGGCTAAATACTCCC 
TTTCGTCTCTTGATTATACCGCATTTGATCCTGTATTTTTCCTTCACCATGCAAACGTTGACAGAA 
TCTGGGCCATCTGGCAGGACTTGCAGAGATATAGAAAGAAACCATACAATGAGGCTGACTGCGCAG 
TCAACGAGATGCGTAAACCTCTTCAACCATTTAATAACCCAGAACTTAACAGTGATTCCATGACGC 
TTAAACACAACCTCCCACAAGACAGTTTTGATTATCAAAACCGCTTCAGGTACCAATATGATAACC 
TTCAATTTAACCACTTCAGCATACAAAAGCTAGACCAAACTATTCAGGCTAGAAAACAACACGACA 
GAGTTTTTGCTGGCTTTATTCTTCACAACATTGGGACATCTGCTGTTGTAGATATTTATATTTGCG 
TTGAACAAGGAGGAGAACAAAACTGCAAGACAAAGGCGGGTTCCTTCACGATTCTGGGGGGAGAAA 
CAGAAATGCCATTCCACTTTGACCGCTTGTACAAATTTGACATAACGTCTGCTCTGCATAAACTTG 
GTGTTCCCTTGGACGGACATGGATTCGACATCAAAGTTGACGTCAGAGCTGTCAATGGATCGCATC 
TTGATCAACACATCCTCAACGAACCGAGTCTGCTTTTTGTTCCTGGTGAACGTAAGAATATATATT 
ATG 
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INTRON 2D/2E (SEQ ID NO: 14 9) 

GTTATAAAGCAGTATATTCTCTTCAAAAAAGTAGGGGAACTTGGAATTTCAAGGTAAATAACATAA 
CTACCTTCAACGGCACAATATCCATATGATGCCCTGGCCAGCAATGAGGCCTGATCTTTTCCCCAT 
TAAAAATGTCTGGAACATCTTGGGCAAACGTGTGCGTCAACGTAAAACGCCACCAGTCACGCTAGA 
TGAACTTGTCCAGGCGTTGGTGGAAGAATGGGACAGACTGCATCAATTACCATAAGTAGACTCATT 
TGCAGCGAATCAGTCAGTGTTTGACCAATAACGGGGGCATTACGCACTACTGACGCAAAACAATGT 
CAATTTCCGTTTCTTACCCATTCCTTCTTTCACGGACCATAACAGCAAGAGAAACTGNTTAGGTAA 
TGAAATACCGGTGAATTATTGTTAACTGGATTCCTTCTTTGTAAAGATACAATTAGTTTGGGACCA 
ATTATTATTATCATTAGTTTGTTATTGACCTTGAAATTCGAAGTTCCTCTACATTTTTTAAGGAGT 
TTATTTGATTGACAATGAAATGTAAGAAAAGAGCAAATCGTAAAATACGTTAAAAATTATTCCTTA 
AACATCAGTCTCTAACTTCAGTTTAAATTGCCAGTAACACGTGTTATATGATGTTTCCGTTTCTCT 
TTGTTTTTTAGCATTCAACTTATTTGATATAACGTTTTACTGTTTTAGATTCACATCAAACTGCAG 

DOMAIN 2E 

ATGGGCTTTCACAACATAATCTTGTGCGAAAAGAAGTAAGCTCTCTTACAACACTGGAGAAACATT 
TTTTGAGGAAAGCTCTCAAGAACATGCAAGCAGATGATTCTCCAGACGGATATCAAGCTATTGCTT 
CTTTCCACGCTTTGCCTCCTCTTTGTCCAAGTCCATCTGCTGCACATAGACACGCTTGTTGCCTCC 
ATGGTATGGCTACCTTCCCTCAGTGGCACAGACTCTACACAGTTCAGTTCGAAGATTCTTTGAAAC 
GACATGGTTCTATTGTCGGACTTCCATATTGGGATTGGCTGAAACCGCAGTCTGCACTCCCTGATT 
TGGTGACACAGGAGACATACGAGCACCTGTTTTCACACAAAACCTTCCCAAATCCGTTCCTCAAGG 
CAAATATAGAATTTGAGGGAGAGGGAGTAACAACAGAGAGGGATGTTGATGCTGAACACCTCTTTG 
CAAAAGGAAATCTGGTTTACAACAACTGGTTTTGCAATCAGGCACTATATGCACTAGAACAAGAAA 
ATTACTGTGACTTTGAAATACAGTTCGAAATTTTGCATAATGGAATTCATTCATGGGTTGGAGGAT 
CAAAGACCCATTCAATAGGTCATCTTCATTACGCATCATACGATCCACTGTTCTATATCCACCATT 
CGCAGACAGATCGCATTTGGGCTATCTGGCAAGCTCTCCAGGAGCACAGAGGTCTTTCAGGGAAGG 
AAGCACACTGCGCCCTGGAGCAAATGAAAGACCCTCTCAAACCTTTCAGCTTTGGAAGTCCCTATA 
ATTTGAACAAACGCACTCAAGAGTTCTCCAAGCCTGAAGACACATTTGATTATCACCGATTCGGGT 
ATGAGTATGATTCCCTCGAATTTGTTGGCATGTCTGTTTCAAGTTTACATAACTATATAAAACAAC 
AACAGGAAGCTGATAGAGTCTTCGCAGGATTCCTTCTTAAAGGATTTGGACAATCAGCATCCGTAT 
CGTTTGATATCTGCAGACCAGACCAGAGTTGCCAAGAAGCTGGATACTTCTCAGTTCTCGGTGGAA 
GTTCAGAAATGCCGTGGCAGTTTGACAGGCTTTACAAGTACGACATTACAAAAACGTTGAAAGACA 
TGAAACTGCGATACGATGACACATTTACCATCAAGGTTCACATAAAGGATATAGCTGGAGCTGAGT 
TGGACAGCGATCTGATTCCAACTCCTTCTGTTCTCCTTGAAGAAGGAAAGC 

INTRON 2E/2F (SEQ ID NO: 150) 

GTATGTATCTCATGTTTCTCAAATAATTTGATTTTCAATGCCCTTACTATAAAGCACAGTTATTGT 
TCAGTGCCAGTAACCGTTTATTTACGTAAATGTTACAGGCTATTATAATCAAAAATACATTACCGA 
TATTGTTTACCACACAATTATATCATTGTCAAAATCTACCCCCATTACCTGCGTTTTGAATTTGTA 
ACCTTCTGACAAAAATGAATTAGCAAGAGCTCTGATGAAGAACATAATGAACAACACCTATCTTTC 
TTCTTTCAATGACGGTTTAACAATACAATGCACAATGTAAAAAAATATATATATATATATAATTTT 
ATATCTACAGTTAATGCAAATGACTCCACTAATTCAGGGAAACACATTTTCAG 

DOMAIN 2F-1 (1st part of domain f) 

ATGGGATCAATGTACGTCACGTTGGTCGTAATCGGATTCGTATGGAACTATCTGAACTCACCGAGA 
GAGATCTCGCCAGCCTGAAATCTGCAATGAGGTCTCTACAAGCTGACGATGGGGTGAACGGTTATC 
AAGCCATTGCATCATTCCACGGTCTCCCGGCTTCTTGTCATGATGATGAGGGACATGAG 
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INTRON 2F (SEQ ID NO: 151) 

GTAAAATAAAACGTCCAGTCATCGGAAACCCGCCCAGATATATGGGTTTTTTTCTATTTAAACAAA 
AAAGCAGAGACAAAAAGATTATTAAAAGTCACATTTAACTTGATATCAGATCAATAGTTTGGCTAG 
TTAGTGCTCTATATCCCTCAAATCCTTCGAATCTTTAAGCCTCGTGATATTTTGACAAACAGAGAA 
GACTTAGTAGCCCAGACTTTCCCTTATTTTTTCCTGAAAATCTTAATACGGATATTAAATGGATTC 
ATTCTGCAACCTACAACCATAGCCCATATGTTATTATTTCAG 

DOMAIN 2F-2 (2nd part of domain f) 

ATTGCCTGTTGTATCCACGGAATGCCAGTATTCCCACACTGGCACAGGCTTTACACCCTGCAAATG 
GACATGGCTCTGTTATCTCACGGATCTGCTGTTGCTATTCCATACTGGGACTGGACCAAACCTATC 
AGCAAACTGCCTGATCTCTTCACCAGCCCTGAATATTACGATCCTTGGAGGGATGCAGTTGTCAAT 
AATCCATTTGCTAAAGGCTACATTAAATCCGAGGACGCTTACACGGTTAGGGATCCTCAGGACATT 
TTGTACCACTTGCAGGACGAAACGGGAACATCTGTTTTGTTAGATCAAACTCTTTTAGCCTTAGAG 
CAGACAGATTTCTGTGATTTTGAGGTTCAATTTGAGGTCGTCCATAATGCTATTCACTACTTGGTG 
GGTGGTCGACAAGTTTATGCTCTTTCTTCTCAACACTATGCTTCATATGACCCAGCCTTCTTTATT 
CATCACTCCTTTGTTGACAAAATATGGGCAGTCTGGCAAGCTCTGCAAAAGAAGAGAAAGCGTCCC 
TATCATAAAGCGGATTGTGCTCTTAACATGATGACCAAACCAATGCGACCATTTGCACACGATTTC 
AATCACAATGGATTCACAAAAATGCACGCAGTCCCCAACACTCTATTTGACTTTCAGGACCTTTTC 
TACACGTATGACAACTTAGAAATTGCTGGCATGAATGTTAATCAGTTGGAAGCGGAAATCAACCGG 
CGAAAAAGCCAAACAAGAGTCTTTGCCGGGTTCCTTCTACATGGCATTGGAAGATCAGCTGATGTA 
CGATTTTGGATTTGCAAGACAGCTGACGACTGCCACGCATCTGGCATGATCTTTATCTTAGGAGGT 
TCTAAAGAGATGCACTGGGCCTATGACAGGAACTTTAAATACGACATCACCCAAGCTTTGAAGGCT 
CAGTCCATACACCCTGAAGATGTGTTTGACACTGATGCTCCTTTCTTCATTAAAGTGGAGGTCCAT 
GGTGTAAACAAGACTGCTCTCCCATCTTCAGCTATCCCAGCACCTACTATAATCTACTCAGCTGGT 
GAAG 

INTRON 2F-2/2G (SEQ ID NO: 152) 

GTGAGAGAAACTATAATAGTGTATGTCGGCAAAAAATGTGCTCATATCATGACTCTGTTGGCCGGT 
GGTTGCTCTCCTCTCCTCCTCCACCACCACCGGTACCTCCACCTGTCAGGGCATCAATGTACCATG 
AAAATGTCTACAATACTAGGCCTCCTGTAGAAGCACGTAAGATTTACATGGCCGGTTTGTAACTAG 
TTTAAAGTGCTTCACAGTAACCAAAACCAGTCTCTAAAGATTAATGTCTGTTTAAAATTTAATGCC 
ACATTTTCAACTGACATATTCTTGCAATTAAGTACAAATGAAGTAGTATAAATTATCCACAAATAG 
CGTGATGCACCACAAATATAAACCGAGTGCTTTTTTGGCATTCCCCACTTGTTCTGGCATGATCAC 
ATCATAGATCTCGTTCATGAAGATACTGTTGGATGCTTTTTCCCAATATGCCCCAATCTGTTAAAT 
TATTTACACGACCGCAGTGTGTACTTTCATCACTCAGATCTTTACAATGTGTTTGTAACGTTTACA 
ATTAGCGTTATGATTGAAATATTACCCCCTGCTACGTTAAATCACATTCACTCACTCATCTGATGT 
ACTTTACAGGTCATACCGATGATCACGGCTCAG 

DOMAIN 2G-1 (1st part of domain g) 

ATCATATTGCTGGCAGTGGAGTCAGGAAAGACGTGACGTCTCTTACCGCATCTGAGATAGAGAACC 
TGAGGCATGCTCTGCAAAGCGTGATGGATGATGATGGACCCAATGGATTCCAGGCAATTGCTGCTT 
ATCACGGAAGTCCTCCCATGTGTCACATGCCTGATGGTAGAGACGTTGCATGTTGTACTCATG 

INTRON 2G-1/2G-2 (SEQ ID NO: 153) 

GTCAGTATTCTCCAATATGTTTGACTAGTGTCTTGCTCATGTATCAACTATTTTAGGCAACGTTTT 
TGATTGTTATGGTATTTTCATGATATGATTTTATTGCTACCTCTATACCCAAACAAAAATGTTTTA 
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TCAACAATTGTTTGAGTTTTAATGCAAGAAAATTATCAGGAGTAGCGTGCAAAAATGACTGGAAGG 
CATGGTGTACTTCTGTGTGTACATACAAGTGGGTAATGCCTTATTGAACTCGTAATCACTCGTTTC 
AG 

DOMAIN 2G-2 (2nd part of domain g) 

GAATGGCATCTTTCCCTCACTGGCACAGACTGTTTGTGAAACAGATGGAGGATGCACTGGCTGCGC 
ATGGAGCTCACATTGGCATACCATACTGGGATTGGACAAGTGCGTTTAGTCATCTGCCTGCCCTAG 
TGACTGACCACGAGCACAATCCCTTCCACCAC 

INTRON 2G-2/2G-3 (SEQ ID NO: 154) 

GTCAGTATTCTCCAATATGTTTGACTAGTGTCTTGCTCATGTATCAACTATTTTAGGCAACGTTTT 
TGATTGTTATGGTATTTTCATGATATGATTTTATTGCTACCTCTATACCCAAACAAAAATGTTTTA 
TCAACAATTGTTTGAGTTTTAATGCAAGAAAATTATCAGGAGTAGCGTGCAAAAATGACTGGAAGG 
CATGGTGTACTTCTGTGTGTACATACAAGTGGGTAATGCCTTATTGAACTCGTAATCACTCGTTTC 
AG 

DOMAIN 2G-3 (3rd part of domain g) 

GGACATATTGCTCATCGGAATGTGGATACATCTCGATCTCCGAGAGACATGCTGTTCAATGACCCC 
GAACACGGGTCAGAATCATTCTTCTATAGACAGGTTCTCTTGGCTCTAGAACAGACAGACTTCTGC 
CAATTTGAAGTTCAGTTTGAAATAACACACAATGCAATCCACTCTTGGACTGGAGGACATACTCCA 
TATGGAATGTCATCACTGGAATATACAGCATATGATCCACTCTTTTATCTCCACCATTCCAACACT 
GATCGTATCTGGGCCATCTGGCAGGCACTCCAGAAATACAGAGGTTTTCAATACAACGCAGCTCAT 
TGCGATATCCAGGTTCTGAAACAACCTCTTAAACCATTCAGCGAGTCCAGGAATCCAAACCCAGTC 
ACCAGAGCCAATTCTAGGGCAGTCGATTCATTTGATTATGAGAGACTCAATTATCAATATGACACA 
CTTACCTTCCACGGACATTCTATCTCAGAACTTGATGCCATGCTTCAAGAGAGAAAGAAGGAAGAG 
AGAACATTTGCAGCCTTCCTGTTGCACGGATTTGGCGCCAGTGCTGATGTTTCGTTTGATGTCTGC 
ACACCTGATGGTCATTGTGCCTTTGCTGGAACCTTCGCGGTACTTGGTGGGGAGCTTGAGATGCCC 
TGGTCCTTTGAAAGATTGTTCCGTTACGATATCACAAAGGTTCTCAAGCAGATGAATCTTCACTAT 
GATTCTGAGTTCCACTTTGAGTTGAAGATTGTTGGCACAGATGGAACAGAACTGCCATCGGATCGT 
ATCAAGAGCCCTACCATTGAACACCATGGAGGAG 

INTRON 2G/2H (SEQ ID NO: 155) 

GTATGTTTTGAGATCCACATAATCTTCTACCCTGTCTCATTTCTAATGCTCTTCAATACACAATTT 
ATATAGCCTTTGAGCTTCAGATGTATTACGGACAGGCATTACAGTATACATGTAATATGGTTTTCT 
GCTATTTGCAAAAATTGTGTCCTATCTCTGTTCAGATCATCATGGCGGTGACACCTAG 

DOMAIN 2H (SEQ ID NO: 159) 

GTCACGATCACAGTGAACGTCACGATGGATTTTTCAGGAAGGAAGTCGGTTCCCTGTCCCTGGATG 
AAGCCAATGACCTTAAAAATGCACTGTACAAGCTGCAGAATGATCAGGGTCCCAATGGATATGAAT 
CAATAGCCGGTTACCATGGCTATCCATTCCTCTGCCCTGAACATGGTGAAGACCAGTACGCATGCT 
GTGTCCACGGAATGCCTGTATTTCCACATTGGCACAGACTTCATACAATCCAGTTTGAGAGAGCTC 
TCAAAGAACATGGTTCTCATTTGGGTCTGCCATACTGGGACTGGAC 
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Primary structure of the KLH2 protein 

DOMAIN B 

GLPYWDWTMPMSHLPELATSETYLDPVTG 

YLFDSMIYAFEQEDFCDFEVQYELTHNAIHAWGGSEKYSMSSLHYTAFDPIFYLHHSNVDRLWAI 
WQALQIRRGKSYKAHCASSQEREPLKPFAFSSPLNNNEKTYHNSVPT1WYDYVGV 
GMTMSELEEYIHKQTQHDRTFAGFFLSYIGTSASVDIFINREGHDKYKVGSFWLGGSKEMKWGFD 
RMYKYE I TE ALKTLNVAVDDG F S I T VE I TDVDG S P P SADL I P P PA 1 1 FDWR 

DOMAIN C 

ADAKDFGHSRKIRKAVDSLTVEEQTSLRRAMADLQDDKTSGGFQQIAAFHGEPKWCPSPEAEKKFA 
CCVHGMAVFPHWHRLLTVQGENALRKHGFTGGLPYWDWTRPMSALPHFVADPTYNDSVSSLEEDNP 
WYHGHIDSVGHDTTRAVRDDLYQSPGFGHYTDIAKQVLLAFEQDDFCDFEVQFEIAHNFIHALVGG 
NEPYSMSSLRYTTYDPIFFLHRSNTDRLWAIWQALQKYRGKPYNTANCAIASMRKPLQPFGLDSVI 
NPDDETREHSVPFRVFDYKNNFDYEYESLAFNGLSIAQLDRELQRRKSHDRVFAGFLLHEIGQSAL 
VKFYVCKHNVSDCDHYAGEFYILGDEAEMPWRYDRVYKYEITQQLHDLDLHVGDNFFLKYEAFDLN 
GGSLGGSIFSQPSVIFEPAA 

DOMAIN D 

GSHQADEYREAVTSASHIRKNIRDLSEGEIESIRSAFLQIQKEGIYENIAKFHGKPGLCEHDGHPV 
ACCVHGMPTFPHWHRLYVLQVENALLERGSAVAVPYWDWTEKADSLPSLINDATYFNSRSQTFDPN 
PFFRGHIAFENAVTSRDPQPELWDNKDFYENVMLALEQDNFCDFEIQLELIHNALHSRLGGRAKYS 
LSSLDYTAFDPVFFLHHANVDRIWAIWQDLQRYRKKPYNEADCAVNEMRKPLQPFNNPELNSDSMT 
LKHNLPQDSFDYQNRFRYQYDNLQFNHFSIQKLDQTIQARKQHDRVFAGFILHNIGTSAWDIYIC 
VEQGGEQNCKTKAGSFTILGGETEMPFHFDRLYKFDITSALHKLGVPLDGHGFDIKVDVRAVNGSH 
LDQHILNEPSLLFVPGERKNIYY 

DOMAIN E 

DGLSQHNLVRKEVSSLTTLEKHFLRKALKNMQADDSPDGYQAIASFHALPPLCPSPSAAHRHACCL 
HGMATFPQWHRLYTVQFEDSLKRHGSIVGLPYWDWLKPQSALPDLVTQETYEHLFSHKTFPNPFLK 
AN I EFEGEGVTTERDVDAEHLFAKGNLVYNNWFCNQALYALEQENYCDFE IQFE I LHNGIHSWVGG 
SKTHSIGHLHYASYDPLFYIHHSQTDRIWAIWQALQEHRGLSGKEAHCALEQMKDPLKPFSFGSPY 
NLNKRTQEFSKPEDTFDYHRFGYEYDSLEFVGMSVSSLHNYIKQQQEADRVFAGFLLKGFGQSASV 
S FD I CRPDQ S CQEAGYF S VLGGS S EMPWQFDRLYKYD I TKTLKDMKLR YDDTFT I KVH I KD I AGAE 
LDSDLI PTPS VLLEEGK 

DOMAIN F 

HGINVRHVGRNRIRMELSELTERDLASLKSAMRSLQADDGVNGYQAIASFHGLPASCHDDEGHEIA 
CCIHGMPVFPHWHRLYTLQMDMALLSHGSAVAIPYWDWTKPISKLPDLFTSPEYYDPWRDAVVNNP 
FAKGYIKSEDAYTVRDPQDILYHLQDETGTSVLLDQTLLALEQTDFCDFEVQFEWHNAIHYLVGG 
RQVYALSSQHYASYDPAFFIHHSFVDKIWAVWQALQKKRKRPYHKADCALNMMTKPMRPFAHDFNH 
NGFTKMHAVPNTLFDFQDLFYTYDNLEIAGMNVNQLEAEINRRKSQTRVFAGFLLHGIGRSADVRF 
WICKTADDCHASGMIFILGGSKEMHWAYDRNFKYDITQALKAQSIHPEDVFDTDAPFFIKVEVHGV 
NKTALPS SAI PAPT 1 1 YS AGE 
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DOMAIN G 

DHIAGSGVRKDVTSLTASEIENLRHALQSVMDDDGPNGFQAIAAYHGSPPMCHMPDGRDVACCTHG 
MASFPHWHRLFVKQMEDALAAHGAIilGIPYWDWTSAFSHLPAL^ 

SPRDMLFNDPEHGSESFFYRQVLLALEQTDFCQFEVQFEITHNAIHSWTGGHTPYGMSSLEYTAYD 
PLFYLHHSNTDRIWAIWQALQKYRGFQYNAAHCDIQVLKQPLKPFSESRNPNPVTRANSRAVDSFD 
YERLNYQYDTLTFHGHSISELDAMLQERKKEERTFAAFLLHGFGASADVSFDVCTPDGHCAFAGTF 
AVLGGELEMPWSFERLFRYDITKVLKQMNLHYDSEFHFELKIVGTDGTELPSDRIKSPTIEHHGG 

DOMAIN H (SEQ ID NO: 158) 

GHDHSERHDGFFRKEVGSLSLDEANDLKNALYKLQNDQGPNGYESIAGYHGYPFLCPEHGEDQYAC 
CVHGMPVFPHWHRLHTIQFERALKEHGSHLGLPYWDW 
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